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ABSTRACT 


A  problem  that  has  plagued  the  tracking  community  for  decades  has  been  the  lack  of  a  single  metric 
to  assess  the  overall  performance  of  tracking  systems.  The  community  has  developed  many  different 
metrics  and  searched  for  methods  to  fuse  the  multiple  measures  into  an  overall  score,  with  each  selection 
of  metrics  and  fusion  method  usually  motivated  by  weakly  supported  arguments.  Without  an  overall 
metric,  it  was  difficult  to  determine  if  conflicting  differences  in  performance  from  different  metrics,  such 
as  track  length,  track  swaps,  and  track  breaks,  were  leading  to  overall  better  or  worse  performance.  For 
the  intelligence,  surveillance,  and  reconnaissance  (ISR)  community,  the  inability  to  recognize  superior 
performance  from  trackers  meant  that  it  was  difficult  to  acquire  the  right  trackers  for  specific  applications 
and  to  prove  that  performance  was  improving  as  trackers  were  tuned  or  new  trackers  acquired. 

The  authors’  prior  research  has  identified  total  conditional  entropy  as  a  useful  measure  to  assess  the 
overall  performance  of  multi-target  trackers  and  classifiers.  The  measure  can  be  used  to  evaluate  any 
system  that  can  be  formulated  as  an  assignment  algorithm  that  maps  N  classes  of  objects  to  M  labels.  The 
assignments  from  the  decision  system  are  compared  to  a  truth  data  set  to  generate  the  total  conditional 
entropy.  Two-dimensional  plots  of  (1)  mutual  information  normalized  by  the  truth  entropy  and  (2)  the 
conditional  entropy  of  the  sensor  data  given  the  truth  data  normalized  by  the  truth  entropy  can  be  used  to 
construct  plots  similar  to  the  receiver  operational  characteristic  (ROC)  plots  of  detection  theory.  These 
plots  are  useful  for  graphically  depicting  the  performance  differences  between  tracking  algorithms. 

The  authors’  prior  research  neglected  to  provide  error  bounds  on  the  information  theoretic 
measures,  which  are  necessary  to  estimate  the  statistical  significance  of  results  from  tests  on  competing 
trackers  and  classifiers.  This  report  focuses  on  work  conducted  to  generate  error  bounds  for  this  purpose. 
While  reviewing  literature  to  identify  earlier  work  on  error  estimation  for  information  theoretic  measures, 
the  papers  of  Wolpert  and  Wolf  were  found  to  provide  exact  equations  for  calculating  the  first-  and 
second-order  statistics  for  the  three  fundamental  entropy  measures  from  sample  data.  The  associations  in 
the  sample  data  are  counted  to  get  the  number  of  times  that  objects  in  class  N  are  assigned  labels  in  class 
M.  The  authors’  direct  conversion  of  the  Wolpert  and  Wolfs  equations  into  computer  code  produced 
estimates  that  were  valid  up  to  thousands  of  samples  before  numerical  errors  corrupted  the  results.  As  the 
sample  size  was  increased,  the  difference  of  increasing  and  increasingly  similar  parameter  values  caused 
the  numerical  accuracy  to  dramatically  degrade.  The  authors  have  restructured  Wolpert  and  Wolfs 
equations  to  get  stable  numerical  results  up  to  sample  sizes  in  the  billions.  This  report  presents  results 
from  two  steps  in  the  process  to  produce  numerically  stable  computer  code  that  the  authors  believe  can 
estimate  information  theoretic  means  and  statistical  errors  for  any  sample  size.  This  code  is  provided  as  a 
MATLAB  software  package  available  from  MIT  Lincoln  Laboratory. 
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1.  INTRODUCTION 


A  problem  that  has  plagued  the  tracking  community  for  decades  has  been  the  lack  of  a  single  metric 
to  assess  the  overall  performance  of  tracking  systems.  The  community  has  developed  many  different 
metrics  and  searched  for  methods  to  fuse  the  multiple  measures  into  an  overall  score,  with  each  selection 
of  metrics  and  fusion  method  usually  motivated  by  weakly  supported  arguments.  Without  an  overall 
metric,  it  was  difficult  to  determine  if  conflicting  difference  in  performance  for  trackers,  such  as  track 
length,  track  swaps,  and  track  breaks,  were  leading  to  overall  better  or  worse  performance.  For  the 
intelligence,  surveillance,  and  reconnaissance  (ISR)  community,  the  inability  to  recognize  superior 
performance  from  trackers  meant  that  it  was  difficult  to  acquire  the  right  trackers  for  specific  applications 
and  to  prove  that  performance  was  improving  as  trackers  were  tuned  or  new  trackers  acquired. 

Research  conducted  by  the  authors  over  the  last  few  years  has  identified  a  new  technique  to 
evaluate  the  performance  of  multi-target  trackers  [1],  The  technique  uses  information  theory  to  estimate 
the  mutual  information  and  conditional  entropies  for  a  set  of  truth  tracks  and  a  set  of  tracks  generated  by  a 
tracking  algorithm.  The  performance  of  the  tracking  algorithm  is  then  assessed  with  the  sum  of  the 
conditional  entropies  between  the  true  tracks  and  those  produced  by  the  tracking  algorithm.  The  technique 
is  most  effective  for  comparative  studies  of  the  relative  performance  of  multiple  trackers  and  classifiers. 
The  technique  is  also  useful  as  a  way  to  optimize  the  parameters  of  individual  trackers  by  identifying  the 
parameter  values  that  minimize  the  total  conditional  entropy.  The  technique  is  general  enough  that  it  can 
be  used  to  evaluate  classifiers  and,  by  extension,  any  comparison  where  an  N-to-M  assignment  can  be 
made  between  a  decision  system  and  truth  data  [2]. 

The  authors’  prior  publications  neglected  to  consider  the  estimation  of  statistical  errors  for  the 
information  theoretic  measures.  The  errors  are  needed  to  estimate  the  statistical  significance  of  the  results 
from  the  assessment  of  tracking,  classification,  and  decision  systems  when  algorithms  are  being  compared 
with  each  other.  A  literature  search  for  prior  work  on  the  estimation  of  information  theoretic  measures 
and  error  estimates  turned  up  the  papers  of  Wolpert  and  Wolf  [3-5].  Wolpert  and  Wolfs  papers  provide 
equations  to  estimate  statistical  parameters  from  finite  samples.  They  used  a  Bayesian  analysis  to  derive 
closed  form  expressions  for  a  number  of  statistical  estimators,  including  an  exact  derivation  of  the  first- 
and  second-order  statistics  terms  for  the  basic  information  theoretic  entropy  functions.  The  equations 
presented  in  the  original  papers  were  principally  intended  for  the  estimation  of  statistical  parameters  from 
small  sample  sizes.  The  equations  are  able  to  account  for  prior  information  in  the  statistics  estimates, 
including  various  proposals  for  uninformative  priors,  including  Haldane’s  prior  (0),  Perks’  prior 
(l/(n  x  m))  [6],  Jeffrey’s  prior  (1/2),  and  Bayes’  or  the  uniform  prior  (1)  [7].  The  authors’  information 
theoretic  equations  originally  did  not  account  for  prior  probabilities,  and  the  adoption  of  Wolpert  and 
Wolfs  equations  provide  the  means  to  account  for  prior  probabilities. 

In  the  course  of  implementing  Wolpert  and  Wolfs  equations  for  the  numerical  estimation  of 
information  theoretic  measures  with  MATLAB,  a  few  publication  errors  were  found  in  the  papers.  After 
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correcting  the  equations,  the  authors’  initial  computer  implementation  was  unexpectedly  found  to 
encounter  machine  precision  problems  around  sample  sizes  of  107  samples.  While  this  data  sample  size 
limit  is  more  than  sufficient  for  evaluating  most  classifiers,  this  number  is  potentially  problematic  for  the 
evaluation  of  multi-target  trackers.  This  is  because  the  total  number  of  samples  for  multi-target  trackers  is 
defined  by  the  size  of  the  state  space  that  the  trackers  can  potentially  report.  The  sample  size  is  usually 
defined  by  the  number  of  cells  in  the  state  space,  which  is  calculated  from  the  product  of  the  ratios  of  the 
range  and  the  typical  resolution  of  the  measurements  of  each  dimension.  Sample  sizes  can  easily  range  in 
the  billions  or  higher  for  many  data  sets,  with  most  of  the  counts  assigned  to  the  count  of  true  negatives 
(an  easily  overlooked  and  often  implied  assumption  in  multi-target  tracking  is  that  a  track  is  presumed  to 
not  exist  if  a  tracker  does  not  report  it). 

This  report  begins  with  a  summary  of  Wolpert  and  Wolfs  information  theoretic  functions, 
corrected  for  the  typographical  errors  that  appear  in  the  papers.  Performance  results  on  the  stability  of  a 
naive  coding  of  Wolpert  and  Wolfs  are  next  presented  to  show  that  the  numerical  accuracy  of  the 
information  theoretic  variances  and  covariances  becomes  a  severe  problem  for  sample  sizes  on  the  order 
of  a  million.  The  report  continues  with  two  reformulations  of  the  second-order  statistics  functions  in  an 
evolutionary  process  to  produce  software  that  estimates  the  information  theoretic  variances  and 
covariances  with  better  numerical  accuracy.  Analysis  results  on  the  numerical  stability  are  presented  for 
two  of  the  series  of  reformulations.  The  first  reformulation  presented  in  this  report  produced  software  that 
was  accurate  up  to  the  order  of  1012  samples  and  the  second  reformulation  up  to  the  order  of  1015  samples. 
Given  computer  code  used  double  precision  arithmetic  with  an  accuracy  of  1016,  the  final  reformulation  is 
believed  to  be  sufficiently  precise  that  it  will  estimate  reasonably  accurate  variances  and  covariances  for 
all  sample  sizes.  Examples  are  presented  to  show  the  use  of  the  final  computer  code  in  the  analysis  of  the 
relative  performance  of  multi-target  trackers  and  classifiers.  The  final  section  in  the  report  provides  a 
description  of  a  software  package  that  is  available  for  download  from  MIT  Lincoln  Laboratory. 
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2.  EVALUATION  OF  MULTI-TARGET  TRACKERS  AND  CLASSIFIERS 


The  three  most  basic  information  measures  for  pairs  of  samples  {x,  y}  are  the  entropies  for  the 
individual  terms,  H{x)  and  //(>') ,  and  the  entropy  for  the  combined  terms,  H(x,y).  From  these,  the 


mutual  information  between  x  and  y  can  be  calculated: 

l(x,y)  =  H(x)+H(y)-H(x,y).  (1) 

The  two  conditional  entropies  can  be  calculated  from  the  relationships  [8], 

H(x\y)=  H(x,y)~  H(y),  and  (2) 

H(y\x)=H(x,y)-H(x).  (3) 

The  performance  score  recommended  by  the  authors  is  then  given  by  the  sum  of  the  conditional  entropies, 
H(y  \x)+H(x\y)  =  2 H(x, y) -  H(x) -  H(y) .  (4) 

Smaller  values  are  better,  with  zero  being  the  ideal  performance  point  for  a  tracking  system. 


With  a  truth  data  set  that  can  be  processed  by  an  algorithm,  an  accumulation  matrix,  r. ,  can  be 
constructed  that  counts  the  number  of  times  that  subset  i  of  the  truth  enumerations  associates  with  a 
subset  j  of  the  output  of  the  tracking  or  classification  algorithm.  The  naive  approach  to  estimating 
entropy  measures  from  counts  is  to  convert  the  entries  in  the  accumulation  matrix  into  probabilities 
by  normalizing  the  accumulation  matrix  with  the  sum  of  the  total  matrix,  for  example, 


The  naive  probabilities,  pt  and  o  ■ ,  can  also  be  calculated  by  summing  the  rows  or  columns  of  r..  before 
using  the  pair  of  functions  equivalent  to  Equation  (5).  The  entropy  can  then  be  naively  estimated  with 

R{x,y)  =  -YJPy^(py),  (6) 

Uj 

The  other  entropies,  H{x)  and  H(y) ,  can  be  estimated  with  equivalent  functions  based  upon  pt  and 
Pj.  The  conditional  entropies  and  the  total  conditional  entropy  can  be  estimated  by  the  appropriate 
combination  of  the  three  entropies.  The  advantage  of  the  naive  approach  is  that  the  calculations  of  the 
mutual  information  terms  are  simply  estimated  from  the  accumulation  matrix.  However,  this  approach 
fails  to  account  for  prior  probabilities  and  produces  biased  estimates  when  the  sample  size  is  small. 
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In  addition  to  adopting  total  conditional  entropy  as  the  overall  performance  measure  for  tracker  and 
classifier  evaluation,  the  authors  have  defined  terms  to  graphically  show  the  performance  of  multi-target 
trackers  and  classifiers  in  a  space  that  has  similar  characteristics  to  the  receiver  operation  characteristic 
(ROC)  plots.  These  plots  show  the  probability  of  detection  versus  probability  of  false  alarm  or  false  alarm 
rate  and  are  often  used  to  visualize  the  performance  of  detection  algorithms.  The  (1)  ratio  of  mutual 
information  to  truth  entropy  and  the  (2)  conditional  information  of  the  sensor  data  given  the  truth  divided 
by  the  truth  entropy  are  used  as  the  y-  and  x-axes  for  the  graphs.  The  y-axis  is  referred  to  as  information 
completeness  and  the  x-axis  is  referred  to  as  the  false  information  ratio.  The  y-axis  values  range  from  zero 
to  one,  while  the  x-axis  values  are  non-negative  with  no  definite  upper  limit.  The  normalization  by  the 
truth  entropy  is  used  so  that  the  y-axis  values  depict  the  fractional  amount  of  true  information  reported  by 
the  decision  algorithm.  The  plotted  values  are  fairly  stable  against  different  approaches  to  estimating  the 
information  theoretic  measures.  This  sensitivity  is  examined  in  detail  in  Section  8.  Figure  1  shows  an 
example  of  the  information  measures  for  two  simulated  tracker  data  sets.  The  authors  refer  to  the  plot  as 
an  information  coverage  plot.  The  dotted  diagonal  lines  are  isobars  for  the  total  conditional  entropy 
divided  by  the  total  truth  entropy.  The  sum  of  information  completeness  plus  the  ratio  of  the  conditional 
entropy  of  the  truth  data  given  the  sensor  to  the  truth  entropy  is  one.  The  ideal  sensor  produces  zero  total 
condition  entropy,  which  would  be  plotted  at  the  position  (0,1)  in  Figure  1.  The  reader  could  obtain  a 
sense  of  the  significance  of  the  differences  between  the  two  trackers  in  Figure  1  if  error  ellipses  were 
plotted.  For  this  simulated  data  set,  the  total  conditional  entropy  for  Tracker  1  is  0.0142  and  for  Tracker  2 
is  0.0216.  Tracker  1  is  better  than  Tracker  2,  but  without  error  estimates  it  is  difficult  to  determine  the 
significance  of  the  test  results. 
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Figure  1.  An  example  of  an  information  coverage  plot  based  upon  data  from  two  simulated  trackers. 


If  means,  p,  and  standard  deviations,  a,  for  the  total  conditional  entropy  can  be  calculated  for 
multiple  tracking  or  classification  systems  under  test,  a  rough  statistical  test  can  be  performed  to  assess 
the  reliability  of  the  decision  that  one  system  performs  better  than  another.  If  it  is  assumed  that  the 
probability  distribution  functions  for  the  true  values  for  the  total  conditional  entropy  of  the  trackers  can  be 
described  by  Gaussian  functions  centered  on  the  estimated  means  and  with  spreads  characterized  by  the 
standard  deviations,  then  a  reasonable  test  statistic  is 


1 


2  of 


2  h 


J 


f 

exp  ■ 

V 


2  N 

-  dxxdx2  . 

y 


(7) 


This  integral  is  the  probability  that  the  true  (and  unknown)  value  of  the  total  conditional  entropy  for 
Tracker  2,  x1T ,  is  less  than  the  true  value  for  Tracker  1,  x]T .  The  test  is  a  rough  estimate  because  the 
actual  information  theoretic  probability  distributions  are  not  Gaussian  and  the  information  theoretic 
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measures  are  always  non-negative  values.  Even  with  these  limitations,  Equation  (7)  can  still  provide 
useful  estimates  of  the  statistical  significance  of  the  results  from  comparisons  of  trackers  and  classifiers. 


For  actual  use  in  calculations,  the  integrals  in  Equation  (7)  can  be  simplified  with  the  integral  [9] 


uu 

|  G'(x)G(a  +  bx)dx  =  G 


a 


VT +P 


(8) 


where 


1  (  r2^ 

Gf(x)  =  _ exp 

v 


V  2  j 


and 


1  r 

G(x )  =  . _  [  exp 

v2 n  J 


-oo  \  J 


The  result  is  that  Equation  (7)  can  be  reduced  to 


T  =  G\ 


f  \ 

Mi  ~Mi 


v 


^/crf  +cr“ 


2  J 


(9) 


(10) 


(11) 


The  remainder  of  the  report  works  through  the  effort  to  develop  computer  code  to  estimate  means 
and  standard  deviations  so  that  the  statistical  significance  of  information  theoretic  evaluations  of  trackers 
and  classifiers  can  be  estimated  for  any  test  data  set. 
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3.  DERIVATIONS  FOR  COMPUTER  CALCULATIONS 


The  papers  of  Wolpert  and  Wolf  provide  exact  formulae  for  estimating  a  number  of  first-  and 
second-order  statistics  terms  from  which  means  and  standard  deviation  can  be  obtained.  The  primary 
information  theoretic  derivations  of  interest  for  this  report  are  the  joint  entropy  and  the  two  independent 
entropies  for  two  quantized  variables,  x  and}',  as  well  as  the  variance  and  covariance  terms  that  are 

associated  with  the  three  entropies  of  Equations  (1)  through  (3).  Given  estimates  for  the  three  entropies 
and  associated  error  terms,  the  values  for  mutual  information  and  conditional  entropies  and  associated 
error  terms  can  be  calculated.  It  should  be  noted  that  the  original  papers  of  Wolpert  and  Wolf  contained  a 
few  typographical  errors  that  have  been  corrected  here.  A  summary  of  the  corrected  results  of  their 
derivations  are  provided  below. 

3.1  CALCULATION  OF  THE  ENTROPY  MEASURES 

The  relationship  between  Wolpert  and  Wolfs  variables  and  the  expectation  values  for  the  three 


entropies  are 

E(H(x,y))  =  -Su,  (12) 

E(h(x))  =  -Si,  and  (13) 

E{H{y))  =  -& (14) 

The  variables,  S-,  S-  and  S-,  in  Equations  (12)  through  (14)  are  derived  by  Wolpert  and  Wolf  to  be 

+  (15) 

i,j  V 

S7=2^A®'"(r,,+l,r  +  l),and  (16) 

i  V 

=Z“^A°(l)(V/v  +  1’V  +  1)-  (17) 


The  variables  in  Equations  (15)  through  (17)  are  defined  as  follows:  The  measurements  of  interest 
for  the  estimation  of  the  entropies  are  the  prior  adjusted  sample  size  v  ,  the  prior  adjusted  number  of  co¬ 
occurrences  of  event  i  and  event  j  ,  vi; ,  the  prior  adjusted  number  of  occurrences  of  event  i ,  vim ,  and  of 
event  j  ,  v  . .  Wolpert  and  Wolf  considered  a  number  of  different  forms  for  prior  probabilities  in  their 
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Bayesian  derivation  of  statistical  measures.  In  this  report,  the  forms  of  interest  for  the  prior  probabilities  is 
restricted  to  the  consideration  of  those  that  are  of  the  form 


P(p)x  /\v\\pri‘  .  (18) 

i=i 


The  Dirichlet  prior  is  a  special  case  of  Equation  (18)  where  all  prior  constants,  r, ,  are  the  same 
value,  r .  Combining  the  sample  counts  with  the  prior  counts  produces  the  prior  adjusted  counts  for  the 
variables  in  Equations  (15)  through  (17): 


V  = 

-  n  + 

SS 

l  j 

V.. 

=  n 

+  r. . , 

V 

V 

v 5 

K: 

=  nu 

+  r 

»  1  '/•  5 

V-J 

=  n. 

J+r-j 

(19) 

(20) 

(21) 

(22) 


where  n  is  the  total  sample  size,  is  the  count  of  samples  in  class  i ,  «  is  the  count  of  samples  in 
class  j  ,  and  ntj  is  the  count  of  samples  in  class  i  and  j  . 

Hypergeometric  functions,  r(z)  (or  equivalently  named,  gamma  functions),  appear  in  a  number  of 
the  formulae.  The  digamma  function,  T/I0|(z) ,  is  defined  to  be  the  first  derivative  of  the  log  of  the  gamma 
function.  For  any  power  of  derivatives,  the  generalization  of  the  digamma  function  is 

T('!)(z)  =  5:+1ln(r(z)).  (23) 

In  this  notation,  n  is  one  less  than  the  order  of  differentiation,  as  defined  in  conventional  practice. 
Wolpert  and  Wolf  defined  the  following  function  to  simplify  the  representation  of  terms  in  their 
equations: 


0(")(z1)  =  vF(”_l)(z). 


(24) 


The  function  A ®^(zl5z2)  was  thus  defined  to  be 

AO('")(z1,z2)  =  cD(")(z1)-  O )(z2),  (25) 

and  is  the  last  of  the  definitions  needed  for  Equations  (15)  through  (17). 


The  second-order  statistics  for  the  three  fundamental  entropies,  H(x,y),  //(x) ,  and  H(y),  require 
six  functions.  Two  pairs  of  the  functions  are  symmetric  under  interchange  of  x  andj,  so  only  four 
functions  need  to  be  explicitly  provided.  The  two  additional  functions  are  obtained  by  exchanging  x  and 
y  variables. 

The  second-order  statistics  term  for  e(i  f(x,y)l  f(x,y)) ,  or  &UMN  ,  as  defined  by  Wolpert  and  Wolf, 
is 


IJMN 


TI 

i,j  m,n^i,j 

V 


VVVnm 

v(v  + 1) 


{aO(i)(ia.  +l,v  +  2)A^l\vmn  +1,v  +  2)-0(2)(v  +  2)} 


+  &<S>%t -  +2.V  +  2)}  . 
The  second-order  statistic  term  for  E{H{x,y)H{x  ))  ’or  6W’is 


(26) 


S7777  = 


IJM 


+  ^  +  2Wl)(ym.  +l,v  +  2)-^2){v  +  2)} 


i,j  m^i 


+  A0(l)(vy  + 1,  v  im  +  i)a0(i)(i/,..  +  2,  v  +  2)  +  AO(2)(i/,..  +2,v  +  2)}. 


(27) 


The  second-order  statistic  for  E(h{x,  y)H(y)),  or  can  be  obtained  by  substituting  •  n  for  m  • ,  and 
n  for  m  ,  with  the  summation  modified  to  be  over  n  ,  excluding  j  instead  of  i . 

The  second-order  statistic  term  for  E(h(x)h(x  ))  ,or  S-,is 

=  S^f%{AO(l)(v,,  +l,v  +  2)A0(1)(v,„.  +  l,v  +  2)-  O  W(v  +  2)} 


£  Vi'}{X\)  {tA°(l)(V--  +  2> v  +  2)f  +  A0(2)(v,,  +  2,  v  +  2)|. 


(28) 


The  second-order  statistic  for  E(H{y)H{y)) ,  or  can  be  obtained  by  substituting  i  •  for  •  j ,  m  •  for 
•  n ,  and  swapping  i  for  j  and  n  for  m  . 

The  second-order  statistic  term  for E(H(x)H(y)) ,  or  S/v ,  is  the  most  complicated  of  the  four 
expressions,  partly  because  it  contains  terms  with  infinite  sums: 
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%  =ZV"iiV'll^{{[M)(')(^  +  2W  +  2)]2  +  M)(2,(^  +  2,  V  +  2) 


v(v  + 1) 

\  fe.  -  W,  )  +  0.,,  -  W,  )  ,  0>  -  W,  \v.n  -  Yin  ) 

vm^yin  +1) 


V 


V,„ 


+  AcD(l)(v.,  +  2,u  +  2)x 


I 


Qk,  i) 


r=0 


r! 


{v.h  ~Vin)r 


Mr 


l  — 


V  Vin+r  J 


A  0*.  A 
v  ^«+ry 


Om), 


r= 0  5=0 


Of.  ~  Vjn  ),  Q.„  ~  Vin  1  Q,  fo)  g|  Ml 

/-!  *!  I 


(29) 


This  equation  is  a  corrected  for  typographical  errors  in  the  papers  of  Wolpert  and  Wolf. 

The  variables  and  nomenclature  in  Equation  (29)  that  have  not  been  previously  defined  are 
v;t,  =  vim  +  v.„  -  v,„  and  (30) 

(31) 


in  f  •«  in 


( a)b  =  r(a  +b)/T(a), 


where  ( a)b  is  Pochhammer’s  symbol  [10]  and  is  used  to  represent  the  ratio  of  two  gamma  functions.  In 
this  usage,  it  is  a  rising  factorial  as  opposed  to  a  falling  factorial. 


Lastly, 


(-1)'  ^  1 


Qi  0>  Vi )  =  [i  -  ©O'  -Vi- 1)]?2 — x  Yj - + ©o  -*ii  -  !X-  r0  -rh)- 

Vh  -J/Zo’h  ~r 

The  Heaviside  theta  function  in  Equation  (32)  is  defined  as 
©(x>0)  =  l,  ©(x  <  0)  =  0 . 


(32) 


(33) 


The  expansion  of  the  Qx  function  is  necessary  for  eventual  implementation  of  S/v  in  computer 
software.  For  the  (9,  (/,//, )  functions  are  only  evaluated  for  1}x=\.  This  limits  the  functions  of 
interest  to 


e,  0,i) = [i -©0-2)] 


(-iy 


0-2) 


y-i  i 

y— 

^  1  -  r 


1-0 


+  ©0-2)r0-i)  - 


(34) 
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By  standard  usage,  the  sum  is  defined  to  be  zero  for  upper  limits  less  than  lower  limits.  The  three  classes 
of  interest  for  the  expansion  of  the  Qx  function  are 


fl(0,l)  =  0;  y  =  0, 

(35) 

Q\  (l,l)  =  -1 ;  7  =  1,  and 

(36) 

0,(7,l)=r(7-l);  7  >2. 

(37) 

Thus,  the  j  =  0  terms  in  all  infinite  sums  can  be  ignored.  Because  the  Q,  terms  are  always  associated 
with  division  by  a  gamma  function,  the  following  simplifications  are  useful: 


a(u) 

1! 


-i;  .7  =  1 


(38) 


OM  r(y-i)  (j- 2)  l  . 
./'■  j-  j!  j{j~  l)’ 


(39) 


The  single -term  infinite  sums  can  be  redefined  as  infinite  sums  of  the  form, 


L  y  y 

K 

1 

H 

1 _ 

f  v  — V ■ 

Y  y  m 

v  x  5  V y  5  5  ,  . 

^0  r! 

1 

S 

>8 

{  Vin+r  }_ 

and  rewritten  as 


s\vx,Vy,vin,vin)  =  - 


(Vx-VinX 


M 1 

00  1 

V—1 _ 

^2  r(r  ~ !) 


f  V  -V  ' 
1-  '  1 


V 


Vfn  +  1  j 


(w  -yjr 


{Vin) 


\  Vy-Vin ' 
r  V  Vin+r  J 


With  the  use  of  Equation  (31),  the  function  can  be  written  as 

r(vI-v!,+i)r(v;,)( 


S\{vx,Vy,vin,Vi  „)  = 


r(w  -w,)r  (vin  +i) 

1  (Vx  ~  Vin  ), 

h  rir  - 1)  ), 


-=  ^  v  -v.  5 

l  _  y  m 
\  Vin  +  1  J 


\  vy~yin ' 
>  V  Vin+r  J 


(40) 


(41) 


(42) 
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The  use  of  the  recursion  relationship 

r(a  + 1)  =  «r(fl) 


(43) 


simplifies  the  function  to 


Si(vx,vy,vin,vin)=-^x _  Vin^ 


f  V  -v  ^ 
1-  v  1 


in  V  Vm  +1  y 
-  V,n  ) 


GO  1 

y  1 

^  Kr  -  0 


(V,n  ) 


>  v  ^+ry 


Equation  (30)  can  be  used  to  get 

{vx  -yin\yx+ 1) 


,(Vin  +1) 


r= 2 


1 

!< 

K 

1 

* 

i _ 

ir  -  0 

Mr  { 

V  —  V 
|  _  y  m 

V-  +  T 


The  double  summation  can  be  redefined  with  the  function 


S1{vl.^.n,vin,yin)  =  YJYu 


;•= 0  s=0 


n  ),• 

(k„  -Vi J 

1 

[v. 

V  in 

)r+5 

i 

r\  j 

?! 

and  expanded  to 

s2(vi.,v.n,vln,v1n)  = 


^  {vi.~M i(v 

•n 

(vj 

V  (v<*  “v 

>2 

in  )l  ^ 

fa  a  (5,1) 

A 

1 

• 

1 

fai 

w  )r 

fa  ^ 

fa,  -  vf„  )i  Ql  (r ,l) 
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00  00  fi/ 
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fa  r! 

J,.fa  -nJ,  a  fa)  &  fa) 

r=2  5=2  ' 

fa  fa,  r!  *! 

as  was  done  for  the  single  sums. 


(44) 


(45) 


(46) 


(47) 


12 


Using  a  similar  approach  to  that  used  for  the  equations  of  Sl ,  Equation  (47)  can  be  rewritten  as 


S  (v  v  v  v  )  -  V‘n  v>n  1 

^ 2  V  !•  ’  ’•n  ’  *  in  •>Vin)  \ 

(uJ2 


(  Vm )«  i 

-(u.  -KnJZj— 7=-i - ? — u 

^  (uJi«  4*-i) 

/  IV-1  l17!*  ~  Vin  ),■  1 


zz 


(v  —  V  )  (v  —  V .  )  1  1 

\  i •  in  Jr  V  *n  in  J s  1  1 

(uJ,.+J  r(r-l)4s-l)’ 


which  means  that  6„.  can  be  rewritten  as 


%  =  £  V^;+\}  {{[AO(l)(i7;.„  +  2,V  +  2)]2  +AO(2)(i7,  +2,v  +  2)} 
„  "|  (u.  -  )  +  (u„  -  U„  )  ,  (u.  -  Vin  Xu,,  -  Vj„  )N 

V  Vin  VJVm  +  1) 

+  AO (l  1  (vin  +  2,  V  +  2)[Sj  (vim ,  v.„ ,  V,.„ ,  vin )  +  S,  {v.n ,  vt.,vin ,  vin )] 

+s2(u.,^.«,u«,uJ}- 


The  above  function  was  coded  in  MATLAB  to  produce  the  first  version  of  an  algorithm  to 
numerically  estimate  the  means  and  covariances  of  the  three  entropy  measures. 

3.2  CALCULATION  OF  ADDITIONAL  INFORMATION  THEORETIC  MEASURES 

The  mutual  information  and  conditional  entropies  can  be  estimated  from  the  three  entropy  values, 
H(x,y),  H(x),  and  II  (y  ) .  The  mutual  information  is  given  by 

l{x,y)  =  H(x)  +  H{y)-H(x,y)  =  SJi  -S7  -%  .  (50) 

The  two  conditional  entropies  are  given  by 

H{x  |  y)  =  H(x,y)-H(y)  =  &-  -  S-  and  (51) 

H{y\x)  =  H(x,y)-H(x)  =  &1  -%  .  (52) 

The  sum  of  the  conditional  entropies  is  used  by  the  authors  as  an  overall  score  for  trackers  and 
classifiers, 
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H(x\y)  +  H(y\x)=2H(x,y)-H(y)-H(x)  =  S1+S-/  -2S-_  (53) 

The  second  moment  terms  for  these  additional  variables  are  shown  in  Table  1  and  are  needed  to 
estimate  the  variances  and  covariances  for  all  six  information  theoretic  measures. 


TABLE  1 

A  Table  of  Second-Order  Statistics  Functions  for  Common  Information 

Theoretic  Variables 


Terms 

l(x,y) 

H(x\y) 

H{y  |  x) 

H(x  1  y)+H(y  |  x) 

H(x,y) 

8 —  "T  8 —  —  8 - 

IJM  IJN  IJMN 

^ IJMN  ^ IjN 

8 - -8 — 

IJMN  IJM 

28 - -  8—  -  8 — 

IJMN  IJN  IJM 

H(x) 

® 1m  +  ®  In  ~  ^ Tjm 

&—-S- 

IJM  IN 

^  UM  ~  ^  lM 

28 — -8--8— 

IJM  IN  IM 

H(y) 

8 —  +6> —  —  8 — 

IN  JN  IJN 

8— -8- 
IJN  JN 

8— -8- 
IJN  IN 

^ UN  ~  ^  JN  ~  ^7 N 

i{x,y) 

® IJMN  +  +  ^ JN 

-  i(s — + S —  -  S— ) 

'  IJM  IJN  INI 

~gim+2sm 

+  8 -  —  8 —  —  8 — 

IJM  IN  JN 

—  8 - +  2,8 —  +  8 — 

IJMN  IJM  IJN 

-8- -8— 

IN  IM 

-28 — +  38—  +  38 — 

IJMN  IJN  1JM 

-28- -8—  -8— 

IN  JN  IM 

H(x\y) 

®  IJMN  +  ^  JN  ~  ^  UN 

8 —  —  © —  —  8 —  +  8 — 
IJMN  UM  IJN  IN 

2SUMN+SJN-3SW 

-&1JM+Sm 

H(y  |  x) 

8 - +  8 —  —  28 — 

IJMN  IM  IJM 

2gUMN+&IM-3&IJM 

~6un+&in 

H(x\y)+H{y\x) 

+  %+2%+^ 
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4.  NUMERICAL  RESULTS  FROM  THE  INITIAL  ALGORITHM 


To  evaluate  the  performance  of  the  initial  algorithm,  a  test  accumulation  or  confusion  matrix  was 
used  as  input  to  the  function.  The  accumulation  matrix  that  was  chosen  for  numerical  evaluation  was 
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.00 
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.07 

.18 

.00 

.23 

.03 

.13 

.00 

.38 

.08 

.13 

.00 

.03 

.02 

.18 

.07 

.12 

.45 

(54) 


and  comes  from  a  paper  by  Farhadi  et  al.  [11].  For  the  evaluation  here,  the  matrix  was  scaled  by  different 
factors  to  evaluate  the  accuracy  of  the  resultant  means  and  covariances  for  different  simulated  sample 
sizes.  Figure  2  shows  how  the  covariance  estimates  change  with  sample  size.  Negative  eigenvalues  are 
indicated  by  the  red  circles.  It  can  be  seen  that  one  of  the  eigenvalues  in  the  covariance  matrix  becomes 
negative  above  107,  which  is  clearly  incorrect  because  all  eigenvalues  are  required  to  be  positive  for 
covariance  matrices.  Rounding  errors  are  so  severe  at  these  sample  sizes  that  negative  eigenvalues  are 
calculated.  Greater  numerical  precision  in  the  calculations  is  needed  to  obtain  accurate  estimates  from  the 
calculations  (Flints  that  there  were  publication  errors  in  the  Wolpert  and  Wolf  papers  originally  came 
from  examining  the  eigenvalues  of  the  covariance  matrices  for  the  three  entropies).  The  correct  trend 
would  be  for  the  eigenvalues  to  continue  to  decrease  approximately  log-linearly  as  the  sample  size 
increases.  The  term  that  suffered  the  most  from  numerical  precision  problems  in  the  initial  coding  of  the 
algorithm  was  S— . 


The  terminology  for  this  report  is  to  use  the  phrase  ‘accumulation  matrix’  to  refer  to  matrices  that 
are  sometimes  called  ‘confusion  matrices’  in  the  context  of  the  evaluation  of  classifiers.  The  phrase 
‘association  matrix’  is  used  to  refer  to  matrices  constructed  to  determine  the  optimal  association  between 
sets  of  tracks.  The  association  matrices  are  used  to  optimally  associate  truth  and  estimated  tracks  at 
individual  frames  or  time  steps.  The  accumulation  matrix  is  used  to  count  the  number  of  times  that  pairs 
of  tracks  optimally  associate  between  the  two  track  data  sets. 


15 


Scaled  sample  counts 


Figure  2.  The  absolute  values  of  eigenvalues  from  the  covariance  matrices  of  entropies  for  different  scale  factors  of 
the  accumulation  matrix  in  Equation  (54)  using  the  initial  algorithm.  The  x-axis  is  the  scaled  sample  counts  in  the 
matrix.  Negative  eigenvalues  are  shown  with  red  circles  to  indicate  where  the  covariance  terms  are  incorrectly 
estimated. 
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5.  DERIVING  COVARIANCE  EQUATIONS  FROM  FIRST-  AND  SECOND- 

ORDER  EQUATIONS 


There  were  a  number  of  causes  for  the  limited  precision  in  the  numerical  estimates  from  the  first 
algorithm.  First,  the  variance  and  covariance  terms  were  numerically  calculated  from  the  difference 
between  second-order  statistics  terms  and  squares  of  first-order  statistics  terms.  In  general,  as  sample 
sizes  increase,  the  second-order  statistics  terms  converge  toward  the  square  of  the  first-order  statistics 
terms  because  errors  decrease  as  sample  sizes  increase.  The  numerical  estimates  of  the  variances  and 
covariances  become  more  and  more  inaccurate  as  the  algorithm  calculates  differences  between  pairs  of 
increasingly  large  and  similar  numbers.  Analytically  deriving  the  covariance  formulae  from  the  difference 
between  second-  and  first-order  statistics  terms  and  coding  that  resultant  function  was  the  first  step  used 
to  make  the  calculations  less  sensitive  to  machine  precision  errors.  This  reformulation  produced  an 
algorithm  with  equations  that  accounted  for  most  of  the  cancellation  between  terms  before  any  numerical 
calculations  are  performed. 

A  second  technique  used  to  reduce  machine  precision  errors  was  to  identify  pairs  of  terms  in 
individual  variance  and  covariance  equations  that  nearly  cancel  and  arrange  the  calculations  to  account 
for  this  cancellation.  The  estimates  of  the  infinite  sums  involved  in  the  calculation  of  S—  improved  the 
most  from  this  approach. 

5.1  DERIVATION  OF  THE  FIRST  FIVE  COVARIANCE  TERMS 

Five  of  the  six  statistics  terms  are  relatively  straightforward  to  analytically  reformulate  in  terms  of 
variance  and  covariance  terms;  the  sixth  term  is  much  more  complicated.  Derivations  for  the  first  five 
terms  are  presented  in  the  section  before  presenting  the  derivation  of  the  sixth  formula  in  its  own  section. 

The  covariance  term  for  H(x,y)  is  given  by 

Vjjmn  =  E(H{x,y)H(x,y))-E(H(x,y))2  =  (55) 


V  V 

ij  mn 


and  expands  to 

5,/A+i) 

_  i/ 

z 


IJMN 


{AO(l)(v,y  +l,v  +  2)AO(l)(vm„  +1,v  +  2)-0(2)(v  +  2)} 


i,j  m,n^i,j 

VA 

V 


hj 


{a$  W  {vy  +2,v  +  2)2+  AO  (2)  [vy  +2,v  +  2)} 


+1,v  +  1)^^LAOW(vmji  +  l,v  +  l). 

y  J  1/ 


(56) 
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2 

It  is  convenient  to  treat  cr)/v/v  as  a  combination  of  diagonal  and  off-diagonal  components, 

2  2  2 

aUMN~D  <TIJMN+OD  GijmN  ’  (^) 


and  rewrite  Equation  (56)  as  the  pair  of  equations, 


and 


od  ° umn 


{a0(i)(v^.  +l,t/  +  2)A0(l)(v„„,  +1,v  +  2)-0(2)(v  +  2)} 

i,j  m,n*i,j  / 


AO>(l)(^+l,v  +  l)  X  — A0(l)(vm„+l,v  +  l)  . 

1/  ^  1/ 


(59) 


The  next  step  is  to  modify  the  Ad) 1 1 1  terms  to  have  the  same  parameter  values  for  as  many  subsets 
of  terms  as  possible  so  as  to  get  the  maximum  cancellation  between  terms.  The  first-order  statistics 
functions  contain  terms  with  v  —  1 ,  whereas  the  second-order  statistics  functions  contain  terms  with 
v  -  2  .  The  following  recursion  relation  [12]  is  useful  for  converting  the  functions  to  consistent  parameter 
values: 


d>(l)(z  +  l)  =  T'0(z  +  l)  =  T'0(z)+i  , 

z 

which  means  that 

Ad)(l)(z!  +l,z2  +2)=  AtD^Zj  +l,z2  +l) - — 

z  2  +  1 

and 

Ad>(l)(zj  +2,z2  +2)=  Ad>(l)(z,  +l,z2  +l)+— - — . 

Zj  +  1  z2  +  1 

Working  through  the  substitutions  and  the  algebraic  manipulations  results  in 


(60) 


(61) 


(62) 
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ij(V9  +1) 


hj 


'(v  +  l) 


v  -  v;, 


V  -  V- 


H+1)-- 


AO(l)(^.  +l,v  +  l)2 


+  2  7 - ^ — r  AO(l)  (v„  + 1,  v  + 1)+ 

K+iXv+i)  v  y  ' 


V  -  V:: 


,h  •  'X1'  • !) 


+  AO(2)(v..  +2,v  +  2) 


and 


OD  °  UMN 


-EE 

i,j  m,n^i,j 

l  ' 


V  V 

ij  mn 


y{y+ i)L 


-AoWf^.+lv  +  ljAoW^+l,  v  +  \) 


v  + 1 


— - - (AO(l)(v,7  + 1,  v  + 1)+  AO(l)(v,„„  + 1,  v  + 1))|  -  0(2)(v  +  2) 

V  +  1  y 


The  same  can  be  done  for  the  next  term, 


° IJM  ® IJM  ^IJ^M 


-  (A°(l>(^  +  !^  +  2)A(1)(1)(^.  +  !^  +  2)  -  °(2)(^  +  2)} 


i,j  m^i 


+Er^lMvi-+2,,+2)]! 

+  AO(1)(v,y  +1,v,..  +i)a<D(i)(k,..  +2,v  +  2)+AO(2)(v,..  +2,v  +  2)} 
-X^AO(l)(v,  +l,v  +  l)^^A0W(vm.  +l,v  +  l)  , 


(63) 


(64) 


(65) 


which  can  again  be  split  into  diagonal  and  off-diagonal  terms, 


2  _  2  2 
<JLJM~D<JJJ^~>rOD<JYf^  > 


D^TIm  -  + 


+  AOW(v..  +l,v;.  +l)AO(l)(v,..  +2,v  +  2)+ AO(2)(v,..  +2,v  +  2)} 
-  X  ~r“  A(1)  (0  (vy  + 1,  v  +  l)A0 {1)  (vf.  +  1,  v  +  1)  , 


hj 


(66) 


(67) 
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OD  G ijm 


=  X  Z  (A°(l)  +  ^  +  2)AfI)(l,k.  + 1,  v  +  2)  -  0(2)  (v  +  2)} 

i,j  m*i  nv  / 


-  £ —  AOW  (v  +  1,  V  +  l)X  —  AO(l)  (vm.  +  1,  V  +  1)  . 

V/  1/ 

z,y  m^i 


(68) 


The  equation 

AO 1 (1)  (a,  c)  =  AO 1 (l)  {a,  b)  +  AO (l)  (b,  c ) 


(69) 


is  useful  for  accounting  for  the  mixing  of  variables  between  the  two  summations  in  Equation  (67).  Again, 
substitution  with  appropriate  conversions  for  the  AO(l)  functions  can  be  used  to  get  consistent  variables 
for  cancellation,  which  produces 


2 _  _ 

D° IJM 


■y  vij  (t*  +  0 

Tj  v(v  +  l) 


V  -  V:. 


(v(.+lXv  +  l) 

AO 


AO(l)(v,..  +l,v  +  l)  +  AO(1)(u().  +l,v  +  l)+ 


v  ~  Vi. 


K+lXv  +  lJJ 


(2)(v,.  +  2,v  +  2)+  fV  Vi'J  AO(l)(vi7  +l,v  +  l)AO(l)(u,..  +l,v  +  l) 

{Vi.  +  Vjy 


(70) 


and 


OD  G  ijm 


--A<D(l)(v,  +1,v  +  1)aoW(v1b.  +  l,v  +  l) 
^  ^  v(v  +  1J  V 


i,j  m^i 


+  1,v  + 1)+  AO^(v  +l,v  +  l)}+- — 1— —  -0(2)(v  +  2) 
V  + 1  (v  + 1) 


(71) 


For  the  next  equation, 


<7  =  &—  —  , 

IM  IM  I  M  ’ 


(72) 


the  resulting  combination  is 
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<*h  =  H^TT)  V,.  +  Uy  +  2h®(,}(v„.  + 1,  v  +  2)-  4><2  V  +  2)} 

+  YJV“t“  l^{[A°(l)(^. +2,v  +  2)f  +AO(2)(v,.  +2,v  +  2)} 


'(v  +  1) 

-  2— A<D(l)(v;,  + 1, v  + 1)£— AO(l)(vm.  + 1,  v  + 1)  . 


(73) 


The  same  breakdown  into  diagonal  and  off-diagonal  terms  results  in 

v,;  (v,-.  + 1)  |[AO(i)^  +2,i/  +  2)f  +  AO(2)(+,..  +  2,v  +  2)} 


_2_ 

D^IM 
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2  — ■ A<D(l)(v„+l,v  +  l) 


V 

1 
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(74) 


and 


on 


<4  =  +l,v  +  2)AO(l)(v,„.  +l,v  +  2)-d>M(v  +  2)} 

i,m±iV\V  +  V 


+l,v  +  l)A0W(vm.  +l,v  +  l)  . 

^  v 


(75) 


Various  substitutions  and  algebraic  manipulations  produce 

v.(v.  +l)f 
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v(v  + 1) 


(v,,+lXv  +  l) 


(y  ^ACD^^.+U  +  I)2 
nv.  +i) 

2AO(l)(+,..  +l,v  +  l)  + 


r  v-v, 

v  +  l)(V  +  o 


+  AO(2)(u,..  +2,v  +  2) 
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and 


(76) 
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OD°IM 
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k+i)L 


--AO(1)(+;..  +l,v/  +  l)A0(l)(+,„.  +l,v  +  l) 
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-  (A 0(1  V,,  +  l,v  + 1)+  AO(l)(v„,.  + 1,  v  + 1)) 


-0(2)(v  +  2) 


(77) 


21 


The  AO^(x,y)  function  is  a  common  function  in  the  information  theoretic  formulae.  When  both  the 
x  and  y  terms  are  large  and  close  to  each  other,  inaccurate  estimates  may  be  obtained  if  the  function  is 
numerically  implemented  as  a  difference  of  digamma  function  calls  using  such  libraries  as  can  be  found 
in  MATLAB.  Increased  accuracy  can  be  obtained  by  using  the  recursion  relations  of  Equations  (61)  and 
(62)  to  estimate  the  AOn  )(x,  v)  function  for  large,  similar  values. 

In  working  toward  a  revised  algorithm  for  calculating  AO*'*(x,  v)  with  better  numerical  precision,  it 
should  be  remembered  that  Wolpert  and  Wolfs  equations  can  account  for  prior  probabilities.  The  result  is 
that  the  v ,  ia. ,  vjm ,  and  v  .  terms  may  not  always  be  integers  for  certain  selections  of  prior  probabilities, 
nor  are  their  differences  always  integers.  This  leads  to  a  bit  more  complexity  in  the  reformulation  of 
AOn*(x,y)  because  the  recursion  relations  in  Equations  (61)  and  (62)  cannot  be  guaranteed  to  lead  to  an 
advantageous  cancellation  between  the  pairs  of  digamma  functions  used  in  estimating  Ad>m.  In  general, 
the  differences  of  the  priors  may  be  non- integer  and  prevent  simplification  of  the  Ad)(n  functions  to  finite 
sums  by  the  extraction  of  integer  differences.  Elowever,  some  improvement  in  numerical  accuracy  can  be 
achieved  by  using  the  recurrence  relations  to  get  a  function  containing  a  finite  sum  and  the  difference  of 
digamma  functions  for  two  very  similar  variables. 

The  function  can  be  reformulated  through  the  use  of  the  recurrence  relationships  of  Equation  (61) 

to  get 


;  (w,  +pl,n2+p2)  =  %  (w,  +p1)-x¥0  (n2  +  p2 ) 

"i+pi-i  i 

=  W0  («,  +  pi )  -  W0 («,+/?,+  rem(/?2  -  pt  ,1 ))  -  £  -  (78) 

''=«i+A+rem(p2-pi,l)  r 


for  integer  sample  counts  nx  and  n2  and  prior  probability  contributions  px  and  p2 .  As  described,  the 
variables  p]  and  p2  may  not  always  be  integers. 

5.2  DERIVATION  OF  THE  FINAL  TERM,  cr^ 

The  final  sixth  term,  S—  ,  is  the  most  difficult  one  to  refactor  into  an  equation  for  the  covariance.  It 
is  derived  from  the  first-  and  second-order  statistics  to  get 
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(j —  =  —  —  S-S — 

IN  IN  IN 
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(v  +  1) 
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[l\vin+2,v  +  2)]  +m^{vin+2,v  + 
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Vin  U„(U„  +  0  , 

+  AO (1)  (vin  +2,v  +  2)x[sl  (v„ ,  v.„ ,  vin  ,vbl)+S1  (v.n ,  vjm ,  vin ,  vin )] 
+  s2(yi.,y.n,vin,vi„)} 

VjjV. 

v2 


X  AO (1)  (v,.  + 1,  v  + 1>\0 (1)  (v.„  + 1,  v  + 1)  . 


(79) 


An  easy  simplification  is  to  use  the  relation 
-  /-  ,1/1  (v,.  ~  Vin  )  +  -  Vin  )  (U.  -  Vin  \v .n  ~  V in  ) 

1  Vin  VJV,n+]) 


(80) 


to  get 
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+  Vin  (y  in  +  l)AO ' (1)  (vin  +  2,v  +  2 )  x  [Sj  (v;. ,  v.„ ,  vin ,  vm  )+S{  (v.„  ,  V,.  ,  vin ,  vin )] 
+  Vin  0 Vin  +  0^2  fa.  ,  V.„  ,  V in  ,  V,n  )} 

_  y  Vi. y y  AO  (1)  y.  + i,  v  + 1^0  « (v.n  + 1,  V  + 1)  . 

77  v~ 


(81) 


This  also  eliminates  an  awkward  pole  at  vin  =  0  ,  which  can  be  considered  to  be  a  valid  value  through  a 
limiting  process. 


The  function  St  can  be  refactored  for  better  numerical  accuracy.  First,  replace  r  with  k  =  r  —  1  to 


get 


A,(, 


V  V  V  V  =  — 
y  x  ’  v  y’  v  in’  v  in  ' 


,)= 


{vx  -  Vin  Xu  +  0 

vin{vin+ 1)  '  j^k(k  + 1) 


OO  1 

y _ L 

i—t  U  lr  _A 


{vx-vin)k+l\x_  vy-vin 


(v,„ ) 


<t+l 


V  in 


v,„  +  k  + 1 


(82) 


Use  the  equality, 

(U„  +  k  +  \\vin  )t+1  -  vin  (vin  +  \\vin  +  2\ , 


(83) 
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to  get 


S  (v  v  v  y  )-  (v*-^X^+l).  _  _ 

v  x->V y>Vin>Vin)—  —  (—  . \  +  —  /—  /  .  ,  ( ,  . \ 

Yin  \Vm  +  1 )  Vm  K  +  1 )  k\k  +  1 ) 


_L _ ^ _ 1_ 

T  _L  1  1  U  b  _L 


bx-VmX+l 


(’ V  in  +  2)k 


{vx+k  +  \) 


(84) 


Using  similar  steps  as  used  to  get  the  equality  in  Equation  (83),  the  equation 

(Uc  -  vin  Li  =  ivx  -  vin  )(vx  -  v,»  + 

can  be  used  to  reform  Sl  to 


(85) 


S  (v  v  v  v  )  -  (v  x  V'"  ^Vjr  +  ^  i 

*^1  Vx’Vy’Vin’Vin)  —  (—  + 

v,n{vin  +1) 
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k-  +4- 

(Vin  +  2)k 


{vx  +  k  +  l) 


(86) 


Better  numerical  accuracy  can  be  obtained  for  5,  by  the  use  of  judicious  summation.  The  first  term 
in  Equation  (86)  contains  factors  that  can  be  extracted  from  the  summation, 


Siy  y  y  v  )-  (v*-v"Xvx+1) 
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(87) 


To  continue  the  derivation,  define  a  function 

1  (x  +  k  +  lXx  -  z  +  \)k 


¥l(x,y,z)=\~Y_l 


k~t kik  +  0  (x  +  l)U  +  y-z  +  2)k 


(88) 


Better  numerical  accuracy  can  be  achieved  by  the  cancellation  of  multiple  small  values  instead  of 
two  larger  values.  The  function 


OO  j 

§*(*+if 1 

can  be  used  to  redefine  (x,  y,  z)  as 


(89) 
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Now 
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(91) 


For  the  numerical  summation  of  F \{x,y,z),  the  second  term  inside  the  square  brackets  eventually 
drives  toward  zero.  Once  this  value  is  less  than  the  machine  precision  of  the  numerical  calculation,  it  can 
be  neglected  and  the  rest  of  the  summation  analytical  performed  with  the  function 


y  1  1 

hqk(k  +  ])  <1  ' 

The  double  summation 


S  (v  V  v  v  )  -  ^'/'*  V‘n  ^|/*”  V‘n  ^ 

2  V17,-.  >  v.n  >  Vin  ’  Vin  I  ~  (—  \ 

Mi 


~(Vi.  -  Vm  E 

5=2 

00 

~b.n  -  Vm  £ 

GO  00 


{V.n-Vin)s  1 


(Vi.  ~Vin)r  1 

Mr*  1 

(Vi.-Vin)r(V.n-Vm)s  1 


r-2  5=2 


)r+s  r{r  - 1)  5(5  - 1) 

can  be  rewritten  by  switching  from  r  and  ,v  to  k  and  / ,  where  k  =  r  —  1  and  l  =  s  —  1 : 
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Next,  terms  from  the  sums  are  extracted  to  get 
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The  terms  in  the  equation  can  be  rearranged  to  give 
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The  expansion 
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>-z 


1  0 -Vm+1)/ 


tf^  +  1)  (v/,+2), 


4  y"  1 _ 1  4«  U»  +  Oa-  4«  Un  +  1 )/  4  1  O';.  Un +1)a 

Ub  i)  Ui  i)  trr  .n’t  trr  ind  ^ 


A=i 


tr  *4  +  1)  /(/  +  1)  4„  +  2\  (vin  +2  +  4  Uk{k  +  \)  4,  +  2) 

and  terms  rearranged  to  get 


4.  “Uj 


4„Xu«  +0 


!-S 


1  (^.,,-^+1); 


r/  i(i  + 1)  + 2), 


y"  1  4«  V»i  +  0/;  |  4  1  4n  Vm  +  0/ 
■  &  *(*  + 1)  4,  +  2\  \  4  /(/  + 1)  4,  +  2  +  4 


Using  a  similar  approach  to  that  used  in  Equations  (88)  through  (90),  Equation  (102) 
rearranged  as 


S2  = 


4.  ~VmtV.n  ~  V  in  ) 


4  1  fi  4n  Un  +0/ 
4/(/+i)  4, +2), 


4J4«  +1) 

4  1  4»  ~  vm  +  4a  4  1  L  4*»  ~  Vin  +  0; 

^4  +  0  4„  +  2)a  m  4  + 1)  1  4«  +  2  +  4  . 

Next,  the  following  functions  are  defined: 

1  4--  +  1)a 


f2(ut4=1-X 


f^k(k  +  \)(x  +  y-z  +  2)k 


and 


F34  T,z)=  X 


(x  -  z  +  1)a 


-X-U-Ji-  (-v-z+i), 


A=1 


k{k  +  l)(x  +  y  -  z  +  2)*  4  4  +  0  {  (x  +  y  -  z  +  2  +  4 


or  equivalently 


(100) 


(101) 


(102) 


can  be 


(103) 


(104) 


(105) 
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(106) 


F3(x,y,z)  =  V-7 - ~T7 - — - r-F 2(x  +  k,y,z)  . 

tHclk  +  llx  +  y-z  +  21 


Then 


(,/'»  Vin  X'Th  V in  )  IV.  /  1  —  F  (1/  1/  1/  )1 

/—  V—  r  \  If  2  V»n’vit’vin)  F3  V«n  ’  V  i»  ’ V  in  )\  ’ 


(Vln1\Vin  +  0 


(107) 


or,  by  symmetry, 


52  =  (-V'\ _Vi"/:V-  ^  [f2  (v,-.  ,  v.B ,  Vm )  -  F3  (v,-.  ,  v.B ,  v;, )]  . 

Wm  Xvi»  +  u 


(108) 


The  functions  F2 (y  x ,  v ,  vin )  and  F3 (vx ,vv,  vin )  can  be  implemented  in  computer  code  for  improved 
numerical  accuracy.  The  choice  between  using  Equation  (107)  or  (108)  can  have  a  significant  impact  on 
the  numerical  accuracy  of  the  results  when  calculated  with  finite  precision  computers.  More  accurate 
results  are  achieved  when  the  first  term  is  the  larger  of  v.  and  v  in  functions  F, (v  ,v  ,v  )  and 

F3 

With  the  reformulated  functions,  the  equation  for  cr^  updates  to 


(n.W,,  +W,){A<1)(l)(w,  +2,v  +  2f  +A<D(2)(u,,  +2,v  +  2)} 

-A®®(vin  +  2,v  +  2  X(v.„ -Vin\v.n  +l)F1(v.II,vi.,vte)+(vi. -Vfjy,.  +l)F1(v,-. 
+  -  W,  \v.n  -  Vm  )[F2  (v,.  ,  v.B ,  vfe )  -  F3  (yt. ,  v.n ,  )]} 

_  Yi^n. AO(l)(v;.  +  l,V  +  l)A0(l)(v.„  +  1, V  + 1)  . 

V 


v.„,vin)} 


(109) 
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Next,  the  equation  can  be  reorganized  to  move  related  terms  closer  together: 

=  T-T-A  ( (W-  +  v‘»  )A°(2)(^  +  2,  v  +  2) 

+  {y,.y.n  +  ^  )a®(1)(v;„  +  2,  V + 2  )2  -  l/-v>"  (v + ^  Ad»W(v,,  + 1,  v + l)AO(V.„  + 1,  V + 1) 

-AO(l)(v;.n  +2,u  +  2){(v.„  +l)F1(v.„,vi..,viJ+(v/.  +l)F1(v,..,v.„,v/J} 

+  (n.  -  ^  -  Vm  lF2  {v„  ,  v.„ ,  Vin  )  -  F3  (Vf. ,  V.n ,  vin  )]  }  . 

The  relationships 

A®(l)(v,..  + 1,  v  + 1)  =  A®(l)(v,..  +  \,y.n  + 1)  +  A®(l)(v;.„  + 1,  v  + 1) 


a®(1)(f;.„+2,u+2)  =  a®(1)(f;,,  +i,v+i)+^— - — 1- 

v~„  + 1  v  + 1 


or  equivalently 


+2,v  +  2)=  A<t/:i,(v-„  +  I.V+I)-  T'"* 


(n.+iXi'+i)’ 


can  be  used  to  further  simplify  £X— . 

A  new  variable  is  defined  to  continue  to  refine  the  equation  for  through  a  number  of  steps: 

Tx  =(v<.v.„  +v,„)A®W(viii  +2,v  +  2f  -Wsk±l )A<D(i)(v;,  +l,v  +  l)A®(1)(v.„  +l,v  +  l)  .  (114) 


Terms  are  rearranged  to  get 


Tl={vi.v.n  +vi,[A®«(i7,  +l,v  +  l)+  ^  1 

l  \Vin  +  1XV  +  1)J 

-  >v*«(v  +  1VA<s)(0(v  +  i,vin  + 1)+ a®(1)(r„  +  i,  v  + 1))> 

V 

(a®W(k.„  +l,vin  +1)  +  A®(l)(i7,  +l,v  +  l))  . 
Reformulation  continues  to 
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T1  =  (^.„  +vln)x 


A0W(v;,  +l,v  +  l)2  +2  JV  ^  '  AO  ^ (yin  + 1, v  + 1)  + 

[  \vm+\lv  +  \) 

-  V,'V'" ^  +  ^  (a(D0)(v,.  + 1 , v in  +  l)+AO(l)(uw  +l,v  +  l))x 

V 

+l,vin  +l)+  AO«(v,,  +l,v  +  l))  . 

The  variable  can  be  reduced  to 
T\  =(vl.v.n  +v,n)x 

AO«(i7,  +l,v  +  l)2  +2  A^\vin+\,v  +  \)  + 

{  K+1XV  +  1) 

-  —  —  + 1,  v„  +  l)A®l0(r„  +l,i^„  +1) 

V 

+  Acb«(v,,+l,v7.)i+l)AO(l)(v1„+l,  V  +  l) 

+  A4)V(v.n  +1,1/  +  l)AO)W(v.„  +  l,vin  + 1) 

+  AO)W(v,i  +l,v  +  l)2)  . 


’-Vjn) 
fl)(v  +  l) 


~Vin) 

fl)(v  +  l) 


Squared  terms  are  collected  to  get 
T\  =  A<D(l)(u,.n  + 1, v  + 1)2  (vimv.„  +vin)- 


^•>  +  0 


+  (y,y.„  +  { 2  1 (v'  v-  \  ao w  fa,  +  i,  v  +  i) + [  {v  v"'  \  , 

{  fa+W/  +  1)  L'Vm  +,A^  +  I) 

_V,,V.„(V  +  1)^0(1)(  +1)AOd)  (Vtn+1,yin  +1) 

V 

+  A®M(vi.+l,vin+l)A®M(vin+l,  v  +  l) 

+  AO(l)fa  +1,u  +  1)AO(1)(v.„  +1,V,.„  +1)) 
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and  reduced  to 


T{  =AO(l)(i7.n  +l,v  +  l) 
+  +  vin  1  2 


v,„  -- 


(V  -  V,n  ) 


a$(1)(v;,  + 1,  v  + 1)+ 


(vm  +  \lv  +  \) 

—  (aOW(v,,  +l,vin  +l)AO(l)(v.„  +l,vin  +1) 

+  AO(l)(v;..  +  l,vin  +l)AO(l)(v;.„  +1,  V  +  l) 

+  A0W(i7,  +l,v  +  l)A0W(v.„  +  l,vin  +1))  . 


(ZzXml 

[vin+\\v  +  \) 


~\2\ 


Fold  the  result  back  into  the  equation  for  a2..  to  get 


I 


1 

v{v  + 1) 


v 


•n 


+  v„)\S>,-1'(vm+2,v  +  2) 


+  A®w(r,  + 1.  v  + 1)2 


v,„  - 


V  V 

f  •n 


V 


+  (y,:Kn+yJ  2 73- 


(f,:J  ,'Aa.<l>(i7„l+i,v+i)- 


k-r.) 

fa+ixv+ir-  v,Lfe+iXv+i) 

^t±ll  (A®(,)(n. + 1,  v„  +  i)A®(,|k  + 1.  r,  + 1) 


2  h 


+  AO(1)(v,,+l,v;,+l)AcD(l)(v;,+l,  v  +  l) 
+  AO«(v;,  +l,v  +  l)AO«(v.„  +1,^+1)) 

-  AO«(v;,  +2,v  +  2Xv.„  -vjv.„  +1)F1(v.„,vi.,v/„) 

-  AO(1)(^„  +  2, v  +  2)(v;.  -  +  l)F,(v,.., v.„, vj 

+  ^  )(v.»  -  vin  )[p2  K ,  v.„ ,  u,„ )  -  F3  (v;.  ,  v.„ ,  vin )] }  . 


(119) 


(120) 
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The  remaining  A<I>(ll(ien  +  2,  T  +  2)  term  is  expanded  to  get 


<4  = 


=  Z  +v») A°(2)(^  +2^  +  2) 

„  n^  +  i) 


+  AO(l)(v,;  +l,v  +  l)2 


V:„  -- 


■  (y.  v  +  V-  I 
v  ;•  •«  i«  i 


(y-v*) 


-A<B(l)(vto  +l,v  +  l)  + 


k+lXv  +  1)  .  .  1) 

\ 

{v  +  jj (aoC1) (v;.  + 1,  vin  +l)AO(l)(v.„  +l,vto  +1) 

V 

+  AO(l)(v,..  +\,vin+\)m{']{vin  +1,  v  +  i) 

+  AO^vin+\,v  +  \)AO^{v.n+\,vin+l)) 

AO(l)  (v,,  + 1,  v  + 1)  +  JV  ^ 


2\ 


J  J 


in  +  l)^  +  Oy 

{(^.B  -  Vin  lV-n  +  l)Fl  (v'.B  >  ^ )  +  ta  -  V*,  )(v,..  +  l)Fj  (v,.  ,  V.„  ,  vin  )} 
+  -  V'iB  X^.«  -  ^  )[F2  (v,-.  ,  V.„ ,  vin )  -  F3  (via ,  v.n ,  vin )]}  . 


(121) 
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Common  terms  in  AO(1)  are  collected: 


IN  Z  {  V-n  +  Vin  )A°(2)  iVin  +  2>  V  +  2) +  (w.„  +  ) 

v{v  +  \)^ 


CT7,,  = 


(zzXiA 

.fa,  +lX*'  +  l). 


+  A<D[l)(rf„  +l,v  + 1)" 


v,„  -- 


fe+iX^+i) 


+  AO(l)(v,.„  + 1,  v  + 1) 


^•.v.„(v  +  l) 


A$(l)(v.„  +l,vto  +1) 


-  fa„  -  vto  x^.„  +  l)Fi  fa, ,  V,. ,  K„ ) 

-  (k.  ~  Vin  \vi.  +  !)  F1  fa.  >  V.n  >  Vin  ) 


^  +  At»Vi.  H-  l,n.  +  l)M,('y.,  +  l,v„  + 1) 


fa-^J 


(122) 


;  (fa„  ~  vin  \v.n  +  lfa fa„ , vt. , vin )  +  fa.  -  vin  \vt.  +  l)F,  (v;. , v.n , vin ) 


fan  +  +  0 

+  fa.  ~  Yin  \v.n  -  Vin  )[F2  fa. >  V .n  >  ^ in  )  “  F3  fa.  ,  Kn  ,  Kn  )]}  • 


This  provides  a  reformulation  of  (7 with  better  numerical  stability 
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6.  NUMERICAL  RESULTS 


The  same  process  used  to  evaluate  the  original  MATLAB  code  was  used  to  evaluate  the  first 
revision  of  the  code,  based  upon  the  derivations  of  Section  5.  Figure  3  shows  a  plot  of  the  absolute  values 
of  the  eigenvalues  for  the  original  MATLAB  code  and  the  revised  code  for  different  multiplier  scales 
applied  to  the  accumulation  matrix  in  Equation  (54).  As  the  scale  factor  increases,  the  eigenvalues 
decrease,  corresponding  to  more  accurate  estimates  caused  by  the  increased  sample  size.  Negative 
eigenvalues  are  shown  in  red  and  indicate  where  the  algorithm  has  failed  to  accurately  calculate  the 
components  of  the  covariance  matrix.  The  original  algorithm  fails  to  estimate  good  variance  and 
covariance  values  for  sample  sizes  on  the  order  of  107.  The  revised  version  is  able  to  handle  sample  sizes 
on  the  order  of  1011  before  the  results  become  corrupted.  Although  positive,  the  smallest  eigenvalue  for 
the  revised  code  at  sample  sizes  of  10 12  is  suspect  because  it  departs  from  the  relatively  linear  trend  in  the 
log-log  plot.  Clearly,  the  revised  code  fails  at  10 13  because  a  negative  covariance  eigenvalue  is  estimated 
for  the  scaled  test  matrix. 


Eigenvalues  for  Different  Scaled  Sample  Counts 


Using  8x8  matrix  of  frequencies 


10  10 
Scaled  sample  counts 


Figure  3.  A  plot  of  the  absolute  value  of  the  eigenvalues  of  covariance  matrices  calculated  for  an  accumulation 
matrix  with  different  scale  factors  for  two  different  software  formulations  of  the  information  theoretic  equations.  The 
x-axis  is  the  scaled  sample  counts  in  the  matrix.  Negative  eigenvalues  are  shown  with  red  circles  and  squares  to 
indicate  that  the  covariance  terms  are  incorrectly  estimated. 
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Figure  4  shows  the  computation  time  for  the  original  MATLAB  code  and  the  revised  code  for 
different  scales  of  the  accumulation  matrix  in  Equation  (54).  Computation  time  is  significantly  reduced 
for  most  values  in  the  matrix  with  the  revised  code,  except  for  a  range  of  values  associated  with  the  scale 
factor  range  of  109  to  1011.  This  is  caused  by  the  execution  of  the  functions  associated  with  the  infinite 
sums.  The  maximum  difference  is  less  than  a  factor  of  10;  the  increase  in  computation  time  could  have 
been  many  orders  of  magnitude  higher  if  the  refactoring  had  led  to  an  inefficient  algorithm.  Although  the 
execution  time  for  reformulation  of  the  computer  code  was  secondary  to  achieving  improved  numerical 
precision,  the  execution  time  was  tracked  so  that  changes  in  computation  time  for  different  formulations 
could  be  monitored. 
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Computation  Time  for  Different  Scaled  Sample  Counts 


•■■■©■■  Original  code 
■■■■*""  Revised  code 
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Figure  4.  The  computation  time  for  calculating  the  entropies  and  covariance  matrices  for  different  scaling  of  an 
example  accumulation  matrix  with  the  original  MATLAB  code  and  the  revised  MATLAB  code. 


It  is  recognized  that  the  presented  results  are  for  only  one  8x8  accumulation  matrix.  Other 
accumulation  matrices  may  fail  at  lower  sample  counts.  Specifically,  matrices  with  large  counts  in  a  few 
bins,  such  as  is  the  case  for  the  evaluation  of  tracking  algorithms,  may  be  susceptible  to  inaccurate 
estimates  for  lower  sample  sizes.  Despite  continued  numerical  instability  of  the  revised  algorithm,  the 
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optimization  of  the  algorithm  has  extended  the  accuracy  of  the  evaluation  to  sample  sizes  five  orders  of 
magnitude  beyond  the  original  implementation. 


The  primary  variable  suffering  the  most  from  numerical  stability  issues  is  .  The  four  variables, 
v  ,  vin ,  vim ,  and  v,n  can  be  selected  as  the  fundamental  variables  in  the  equations  for  cr^. .  All  other 
variables  can  be  defined  as  combinations  of  these  four.  As  previously  discussed,  the  accumulation 
matrices  used  to  evaluate  tracking  algorithms  have  very  large  counts  in  the  true  negatives  bin.  Generally, 
there  are  three  classes  of  number  ranges  for  which  the  evaluation  has  to  estimate  terms:  the  class  where 
vin  is  the  true  negatives  bin:  {large  vin ,  vi% ,  and  v.n };  the  class  where  either  the  row  or  column  includes 
the  true  negatives  bin  {small  vin ,  small  vi. ,  large  v,n  }  or  {small  vin ,  large  vt, ,  small  v,n  };  and  the  class 
of  counts  of  the  number  of  times  that  pairs  of  tracks  from  two  data  sets  associate:  {small  vjn,  small  vjm, 
small  v,n  }.  The  first  two  classes  encounter  numerical  stability  problems  long  before  the  third  class.  Thus, 
only  the  first  two  classes  have  to  be  examined  for  potential  techniques  to  improve  numerical  stability.  The 
numerical  stability  of  the  third  class  will  improve  by  default. 

To  assess  the  numerical  accuracy  of  the  improved  equations,  the  first  two  sets  were  examined  for 
numerical  stability  issues.  Different  combinations  of  v  ,  vin ,  vt, ,  and  v,n  were  selected  and  scaled  by 
different  orders  of  magnitude,  ranging  from  1  to  10 12.  A  few  samples  are  plotted  in  Figures  5  and  6.  For 
Figure  5,  the  case  where  all  variables  are  large  values,  the  calculation  begins  to  lose  precision  at  a  scale 
factor  of  108,  or  when  the  counts  are  around  1014.  For  Figure  6,  where  one  variable  is  large  and  the  other 
two  small,  the  calculation  begins  to  lose  precision  at  a  scale  factor  of  1 09,  or  when  the  counts  are  around 
1015. 


The  numerical  stability  of  this  first  revision  is  acceptable  for  most  applications,  but  because  the 
accumulation  matrices  for  trackers  can  have  extremely  large  values  for  the  count  of  true  negatives, 
numerical  stability  would  be  advantageous  for  count  sizes  of  1016  for  v/m,  and  v,n.  The  next  section 
continues  to  restructure  the  equation  for  cr^.  .  to  get  additional  improvement  in  the  numerical  stability. 
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.  Values 


Figure  5.  A  sample  plot  of  the  first  class  of  count  distributions  for  c — ,  where  all  counts  are  large. 
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Figure  6.  A  sample  plot  of  the  second  class  of  count  distributions  for  <x— ,  where  the  row  or  column  count  sum  is 
large  and  the  other  counts  are  small. 
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7.  FURTHER  REFINEMENT  OF  THE  NUMERICAL  ESTIMATES 


To  further  refine  the  organization  of  <Jj^  to  get  better  numerical  stability  for  tracking  algorithms, 
the  terms  in  cr^. -  were  examined  to  identify  possible  combinations  of  sub-terms  that  would  cancel  as  the 
scale  of  the  numbers  v  ,vin,  vit,  and  v.„  increased.  The  sub-terms  selected  from  oj-  are  defined  as: 


a 


=  (yi.y.„+yini  7 - 


{V~Vin) 


(^,+lXv  +  l) 


P  =  {vi.v.n+vin)^(l)(yin+2,v  +  2), 


y  =  AO(1*(v,„  +  l,v  + 1)" 


V  V 
V 


K  +  +  1) 

s,  =  -AO(l)(fr,>,  + 1,  V  + 1)  V‘-V'n  (V  +  ^  AO  (V{,  +l,vin+l), 

v 

S3=-A^l\vin  +l,V  +  l)Vi-V"'(V  +  lh®{l\v.n  +1  ^  +1), 

V 

5a  =  -AO {l)(vin  +l,v  +  l)(v.„  -  v in  ){v.n  +  1)F,  {v.„,vim,vin ), 
S5  =  -AO(l)(vF,  +l,v  +  lXvi# -vin\vi%  +l)F|(u,.,u.„,v;J, 

g  =  _»^.>  +  1)AO(i)(  +l,vin  +  l)AO -i)(i/.„  +  l,vin  +1), 
v 

(v  -  ) 


iy.n  -  Vm  h.n  +  i)Fi  {y.n .  , v;„ ) , 

(fr'.  -  Vtn  Xu.  +  0FI  (fr'.  >  V.n  ,  Vin  )  , 


fl="fe+iX''  +  0 

.  (V  -  Vin  ) 

f2">,„+lXv  +  l) 

71  =  (v„  -  v,B  )(v.„  -  v,„  )F2  (v„ , v.n , v„, ) ,  and 

72  =  “(u.  -  )(v.„  -  U,„  )f3  iytm ,  v.„ ,  ) . 


(123) 

(124) 

(125) 

(126) 

(127) 

(128) 

(129) 

(130) 

(131) 

(132) 

(133) 

(134) 

(135) 


Figure  7  shows  how  the  sub-terms  a ,  [i ,  y ,  A, ,  and  c  change  as  the  scale  is  increased  for  one 
selection  of  v,  vin,  vim,  and  v,n.  The  plotted  data  are  divided  by  (v  +  l)  because  these  terms  then 
converge  to  a  constant  value  as  the  scale  factor  increases. 
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Figure  7.  The  sub-term  values  of  crj^  that  converge  as  a  function  of  increasing  scale  factor. 


Unfortunately,  it  can  be  seen  in  Figures  8  and  9  that  the  terms  S2  through  S5,  e ,  rjl  and  77,  do  not 
independently  converge  when  divided  by  (v  +  l).  Fortunately,  Figure  10  shows  that  the  combinations 
S2  +  S4,  £3  +  S5 ,  and  e  +  t]x  +  rj2  nearly  cancel  as  the  scale  factor  increases  until  round-off  problems 
creep  in  above  1 09.  The  pairs  can  be  reformulated  as  combined  single  and  double  summations  to  get  some 
additional  improvement  in  numerical  stability. 
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Values 


Figure  8.  The  sub-terms  of  aj- -  that  do  not  converge  to  limiting  values  as  a  function  of  increasing  scale  factor. 
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Absolute  Values 


Figure  9.  Non-converging  sub-terms  of  cr-^  plotted  as  logs  of  absolute  values. 
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Figure  10.  Selected  sums  of  sub-terms  of  with  significant  cancellation. 


The  reformulation  of  the  combination  of  sums  to  get  improved  cancellation  requires  a  summation 
for  A®  '  that  is  in  a  form  similar  to  the  summations  of  Fi,  F2,  and  F3.  These  reformulations  improved  the 
numerical  accuracy  by  folding  constant  terms  into  the  summation.  The  difference  of  the  digamma 
functions  A<1> 1 1 1  can  be  defined  as  an  infinite  sum  or  as  a  hypergeometric  function  of  unity  argument 
(z=  1)  [13]: 


T0  (v)-T0  (/,)  =  £ 

n= 0 


O' -^Li 

(n + ^L 


(V  -/')< r 


(136) 
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Ignoring  the  common  factor  of  —  AO^(v(.n  +l,V  +  l)  in  the  pairs  of  deltas,  the  sum  for  the  unique 
terms  in  S2  +  S4  is 

S (S2  +  S4)  =  V“ V’"jV  +  l Ad) ( ' } (u,.  + 1 , vin  +  l)+(v.„ -vin\v.n+\)vl{y.n,Vi.,vin)  ,  (137) 

which  is  the  pair  of  summations, 


v  7~t  k\vu,  +  ])k 

CO  j 

“  (1/.«  -Vinb.n  +})YaUr 


k(k  + 1) 


k= 1 


1- 


(u.„  +  k  +  \){v.n  -  yin  +  l)t 

(V'.h  +  1  \Vin  +  2)i- 


After  combining  sums  and  rearranging  terms, 
S{§2  +S4)=-{v.n  -vin){v.n+\)y. 


If  1 

,  iy.n-vin+\)k 

(  Vi.V.n  iy  +  l)  {k  +  l)  !  (v/n  +  1  \v.n  +  k  +  l)  Yl] 

]£*(*  +  !) 

•a. 

+ 

K 

H 

_ 1 

I  (V.n  +  lV  (V.n  ~  Vin  +  k)  {vin  +  1  +  k\v  +  l)JJj 

(138) 


•  (139) 


The  identity 

(a  +  \^a  +  2)k=(a-\-\  +  k\a  +  \)k  (140) 

has  been  used  to  obtain  common  Pochhammer  terms  in  the  two  sums.  The  sum  S3  +  S5  can  be  obtained 
from  Equation  (139)  by  exchanging  vt.  and  v,n  and  accounting  for  the  neglected  common  factor  of 
•  l.t' -1). 

The  sum  of  the  terms  for  \e,r)l,rj1\  is 

S{e  +  i h  +th)  =  -V'-V-n('V  +  ^AVi]){vl.  +\,vin  +l)AO(l)(u.„  +1,^+1) 

v  (141) 

+  (V,.  ~  Vin  \V.n  ~  Vln  )[F2  (V  l%  ,  K„  ,  Vln  )  ~  F3  {V,.  ,  V.„  ,  V in  )]  > 

and  with  the  summations  explicitly  written  out  is 
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S{e  +  %  +n2  )  =  - 


vi.v.„  (v  +  l)  (-  V-  VST1  (Vw  Vi‘  +  ^)*-l  "V  (V"«  V*«  +  ') 

- Win  ~  Vi.  A  Yin  ~  y.n  ^  - 2. - j(~  .  A 

v  tt  %«+!)*  m  %ft,+1)/ 

O'.*  -n«  +4 


*=i 


1- 


(v,-.  +v.«  -n™  +2)i- 


-z 


(n.  -I'm  +  4 


u  k{k  +  l)(v,..  +  v.n  -  vin  +  2)k  j~(  1(1  +  l) 


CO  1 

Ztttt 


1- 


(v.„  +l), 


O',-.  +v.„  -Vto  +  2  +  4 


(142) 


This  can  be  rearranged  into  the  combined  double  summation 


S{e  +  7i  +  ^2 )  =  (v,.  -  vto  )(v'.„  -  vta  )£  - 


1- 


k= 1 


co  i 

Ztttt 


-  *7„  +  4 

^  +  1)T  ^in+2)k 

(V.n  -  Vin  +  1), 


1- 


+ 


(Vj.  -  V in  +  4 

(Vin  +  2)k  'tl  l(l  +  ^)\  (y in  +  2  +  k ); 

Vg,  +  1)  +  jfe.  +  k  +  j)  ^  +  OK.  ~  V/H  + 

v(vfa+l)  (vf.  -v^+h)  (Vfa+l), 


(143) 


Equation  (143)  is  symmetric  under  the  exchange  of  vim  and  v.n,  but  numerical  computation  is  faster  and 
more  accurate  when  the  larger  term  is  associated  with  vit . 

Implementation  of  the  second  functional  revisions  in  MATLAB  leads  to  the  data  plotted  in 
Figure  11.  For  the  code  based  upon  this  version  of  the  functions,  departure  from  the  appropriate  limit  still 
begins  to  occur  at  a  scale  factor  of  10s ,  or  a  total  count  size  around  10 14 .  This  departure  is  significantly 
less  dramatic  than  that  seen  in  Figure  11.  Through  the  examination  of  the  scaling  behavior  of  about  70 
other  combinations  of  counts,  the  numerical  errors  have  been  found  to  grow  when  v.„  >  10 11  or 
v  >  1011.  Because  of  this  behavior,  the  MATLAB  code  that  estimates  the  covariance  values  has  an 
option  to  rescale  the  collection  of  counts  so  that  v..  <  10 11  and  v  <  10 11  for  specific  values  of  i  and  n 
when  calculating  terms  in  cr^ .  The  four  rescaled  values  are  used  to  estimate  the  sub-terms  of  cr| -  in 
Equations  (123)  through  (134),  and  the  software  accounts  for  the  division  by  (v  +  l)  when  estimating  the 
sub-terms.  A  final  division  by  the  unmodified  count  v  is  applied  to  the  sum  of  sub-terms  to  get  the 
contribution  to  cr for  the  overall  sum  of  i  and  n  .  This  option  is  turned  on  by  default  in  the  software 
package  and  may  be  disabled  by  the  user.  The  software  issues  a  warning  to  the  user  if  this  scaling  is 
invoked  during  execution.  It  is  believed  that  the  fractional  error  between  the  values  calculated  by  the 
software  with  rescaling  and  the  true  values  are  small  enough  to  be  ignored  in  most  applications.  The  final 
division  of  the  scaled  terms  in  cr.  by  the  unmodified  value  of  v  causes  the  overall  value  of  a=-  to 
continue  to  decrease  in  magnitude  as  the  counts  continue  to  increase. 
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Figure  11.  The  combined  and  reformulated  sub-terms  of  aL  for  different  multiplier  scale  factors. 


Figure  12  shows  the  eigenvalues  for  the  entropic  covariance  matrices  produced  by  three  different 
versions  of  the  algorithm.  The  results  were  produced  using  the  accumulation  matrix  in  Equation  (54).  The 
eigenvalue  calculations  from  the  second  revision  of  the  algorithm  are  stable  for  large  sample  sizes  that  are 
scaled  up  from  this  test  matrix.  Part  of  this  stability  was  achieved  by  scaling  the  counts  in  the  estimation 
of  cr^,  as  the  counts  passed  1011.  This  is  not  to  say  that  other  accumulation  matrices  will  not  suffer  from 
numerical  round-off  problems,  but  that  the  estimates  look  reasonable  for  this  example.  It  might  be  noted 
that  at  some  point  in  increasing  the  sample  size,  the  major  inaccuracy  in  the  estimated  mean  values  of  the 
information  theoretic  measures  will  be  the  round-off  error,  even  with  the  use  of  double  precision 
arithmetic  with  a  precision  of  about  10"16.  The  errors  will  be  smaller  than  the  double  precision  limits  on 
the  means,  and  researchers  will  have  to  account  for  the  precision  of  the  mean  in  addition  to  statistical 
errors. 
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Figure  12.  Eigenvalues  for  the  original  and  two  versions  of  the  information  theoretic  algorithm.  Revision  2  rescales 
the  counts  if  the  values  are  too  large  to  avert  numerical  precision  problems  when  estimating  eigenvalues.  Negative 
eigenvalues  are  shown  with  red  circles,  squares,  and  triangles  to  indicate  that  the  covariance  terms  are  incorrectly 
estimated. 


The  computation  times  for  the  three  different  algorithms  are  shown  in  Figure  13.  The  second 
revision  runs  with  fairly  consistent  times  across  different  scale  factors  for  the  counts  in  the  8x8  test 
matrix.  The  first  revision  is  generally  the  fastest  except  for  when  the  number  of  samples  is  around  1 09  to 
1011.  Both  revisions  usually  execute  more  quickly  than  the  authors’  original  implementation  of  Wolpert 
and  Wolfs  equations.  The  second  revision  always  executes  faster  than  the  original  code  for  the  example 
shown. 
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Figure  13.  The  computation  times  for  different  scale  factors  on  a  set  of  counts  for  three  different  algorithms  for 
computing  information  theoretic  terms. 


The  plot  is  for  only  one  accumulation  matrix.  It  is  expected  that  even  the  second  revision  will  show 
increases  in  estimation  times  like  that  of  the  first  revision  for  other  accumulation  matrices.  Users  should 
not  expect  that  the  computer  code  associated  with  the  second  revision  will  execute  with  the  consistent 
times  shown  in  this  plot. 
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8.  RESULTS  OF  THE  FINAL  FORMULATION  FOR  TRACKING  ALGORITHM 

EVALUATION 


Once  it  was  believed  that  reasonably  stable  estimates  of  covariance  errors  could  be  made  for  any 
sample  size,  an  evaluation  of  the  performance  of  two  simulated  tracks  was  performed.  Figure  14  shows 
the  results  from  the  simulated  data  sets  used  to  make  Figure  1.  The  plotted  error  ellipses  are  95% 
confidence  regions. 


Information  Coverage  Plot 


Figure  14.  An  information  coverage  plot  for  simulated  track  data  generated  with  revision  2  ofWolpert  and  Wolf’s 
equations. 


An  uninformed  assessment  of  the  plot  would  be  that  the  data  sample  size  is  insufficient  to 
confidently  determine  the  best  tracker.  Part  of  the  misleading  nature  of  the  plot  is  that  the  95%  confidence 
regions  are  in  a  two-dimensional  space  and  therefore  need  to  be  quite  large  to  enclose  95%  of  the 
probability  densities.  The  ellipses  do  not  provide  an  accurate  visual  indication  of  the  confidence  that  one 
tracker  is  better  than  another  when  they  overlap.  Direct  evaluation  of  total  conditional  entropy  provides  a 
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much  better  estimate  of  statistical  significance  than  the  two-dimensional  data.  This  is  partly  because 
changes  to  the  measurements  that  are  parallel  to  the  dotted  lines  in  Figure  14  do  not  change  the  total 
conditional  entropy  and  can  be  considered  to  be  a  nuisance  dimension.  Integration  of  the  error  ellipses  to 
remove  this  nuisance  dimension  produces  the  data  plotted  in  Figure  15  and  leads  to  a  more  informative 
hypothesis  test. 
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Figure  15.  A  plot  of  the  total  conditional  entropy  for  two  trackers,  with  error  bars. 


The  proposed  measure  for  the  overall  comparison  of  the  relative  performance  between  trackers  is 
the  total  conditional  entropy.  Figure  15  shows  the  total  conditional  entropy  for  the  two  example  trackers. 
Tracker  1  is  considered  to  perform  better  than  Tracker  2  because  of  the  lower  total  conditional  entropy. 
The  total  conditional  entropy,  with  error  estimates,  for  Tracker  1  is  0.0142  ±  0.0015  and  for  Tracker  2  is 
0.0217  ±  0.0019.  The  probability  that  statistical  effects  may  have  led  to  the  wrong  conclusion  can  be 
calculated  from  the  test  statistic  of  Equation  (11)  and  is  0.001.  In  other  words,  there  is  about  1  chance  in 
1000  that  tracker  two  is  really  better  than  tracker  one,  given  the  sample  statistics.  This  estimate  should  be 
considered  in  light  of  the  caveats  that  the  error  estimates  only  incoiporate  contributions  due  to  statistical 
effects  and  do  not  include  any  contributions  from  systematic  effects.  It  also  does  not  account  for  any 
biases  due  to  the  fact  that  the  true  distributions  are  non-Gaussian  and  the  values  are  always  non-negative. 
This  is  likely  to  be  a  second-  or  third-order  effect  in  that  both  distributions  will  be  skewed  to  the  same 
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order.  Even  with  the  consideration  of  these  caveats  and  possible  biases,  one  might  still  be  inclined  to 
prefer  Tracker  1  over  Tracker  2.  Analysis  presented  later  in  the  report  on  the  sensitivity  of  the  information 
theoretic  measures  to  parameters  in  the  analysis  software  may  incline  one  to  make  a  different  choice. 

In  the  event  that  the  power  of  the  test  (as  evidenced  by  the  statistics)  is  insufficient  to  draw  a  strong 
enough  conclusion  as  to  which  is  the  best  tracker,  a  larger  data  set  may  be  acquired  for  testing  or  the 
results  of  multiple  independent  data  sets  may  be  combined.  The  information  theoretic  measures,  //(x) , 
Il(y)1fl(x,y)Jl(x\y),  and  H(y\  x),  are  additive  for  independent  systems  and  provide  a  convenient 
method  for  increasing  statistical  support.  The  information  theoretic  measurements  and  errors  from 
multiple  independent  tests  can  be  combined  and  the  resultant  sums  used  to  assess  the  statistical 
significance  of  the  differences  in  total  conditional  entropy.  With  simulations  of  tracker  performance,  it 
may  be  possible  to  estimate  the  expected  statistical  power  that  a  data  set  will  provide,  given  the  number  of 
tracks  and  their  lengths. 

8.1  SENSITIVITY  TO  ASSOCIATION  THRESHOLDS 

The  technique  described  in  this  report  and  contained  in  the  software  package  has  two  parameters 
that  can  affect  the  estimated  information  theoretic  measures.  The  first  parameter  is  the  total  number  of 
counts  that  the  accumulation  matrix  contains.  The  second  applies  to  the  algorithm  that  associates 
estimated  tracks  to  truth  tracks  over  a  number  of  time  frames.  There  is  an  association  threshold  parameter 
used  to  determine  when  an  estimated  track  should  be  associated  with  a  truth  track  or  considered  to  be 
unassociated  with  any  truth  track.  These  parameters  are  not  required  for  the  construction  of  association 
matrices  for  classifiers. 

The  software  package  contains  a  function  that  performs  the  track-to-track  association  using  a 
probabilistic  cost  function  and  a  function  that  performs  the  linear  assignment  of  estimated  tracks  to  truth 
tracks.  Other  researchers  may  wish  to  use  their  own  association  cost  function  and  assignment  algorithms 
to  construct  their  accumulation  matrices.  The  sensitivity  of  the  information  theoretic  results  to  changes  in 
the  association  threshold  and  total  count  size  are  examined  next.  Results  should  be  qualitatively  similar  to 
other  cost  functions  and  association  algorithms. 

Figure  16  shows  the  information  coverage  plot,  parameterized  for  two  different  simulated  trackers 
as  a  function  of  varying  association  threshold.  As  the  association  threshold  is  increased,  more  sensor 
tracks  associate  with  the  truth  tracks  causing  measured  performance  to  generally  improve  for  both 
trackers.  Tracker  1  is  apparently  better  than  Tracker  2  in  this  plot  for  all  values  of  the  association 
thresholds  that  were  tested.  The  figure  shows  that  the  error  covariances  tend  to  increase  slightly  as  the 
thresholds  are  increased.  The  error  ellipses  do  not  provide  strong  visual  evidence  for  Tracker  1  being 
declared  significantly  better  than  Tracker  2. 
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Figure  16.  The  information  coverage  plot  for  the  two  simulated  trackers  for  different  values  for  the  association 
threshold  that  is  used  to  construct  the  accumulation  matrix  used  to  calculate  the  information  theoretic  measures. 
95%  confidence  error  ellipses  are  shown. 


Figure  17  shows  the  total  conditional  entropy  for  different  values  of  the  association  threshold.  Total 
conditional  entropy  decreases  as  the  association  threshold  is  increased,  as  would  be  expected  from  the 
data  plotted  in  Figure  16.  The  error  bars  that  are  plotted  are  the  95%  confidence  intervals  and  assume  that 
the  distributions  are  Gaussian.  Again,  Tracker  1  is  shown  to  be  performing  better  than  Tracker  2  for  all 
tested  thresholds.  Again,  the  naive  interpretation  of  the  error  ellipses  would  imply  that  there  is  weak 
statistical  evidence  that  Tracker  1  is  better  the  Tracker  2.  Evaluation  of  errors  associated  with  total 
conditional  entropy  provides  a  better  assessment  of  which  tracker  is  better. 
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Figure  1 7.  The  total  conditional  entropy  for  the  two  simulated  trackers  for  different  values  for  the  association 
threshold  that  is  used  to  constmct  the  accumulation  matrix  used  to  calculate  the  information  theoretic  measures. 
95%  confidence  error  bars  are  shown. 


Table  2  shows  the  values  plotted  in  Figure  17,  as  well  as  the  probability  that  an  incorrect  decision 
may  have  been  made  as  to  which  is  the  better  tracker.  The  error  values  are  the  one-standard  deviation 
errors.  For  the  data  points  that  are  evaluated,  the  decision  error  peaks  at  a  value  of  0.103,  or  about  10%, 
for  an  association  threshold  value  of  0.95.  The  decision  error  decreases  significantly  as  the  association 
threshold  is  increased.  This  suggests  that  some  care  should  be  taken  when  selecting  the  association 
threshold  for  testing  trackers  so  that  the  thresholds  accurately  reflect  the  true  nature  of  the  data  sets  and 
tracking  systems,  especially  with  regard  to  the  probabilities  for  missed  detections  and  false  tracks.  It  may 
be  worth  analyzing  the  sensitivity  to  the  association  threshold  to  support  the  decision  to  select  one  tracker 
over  another.  It  is  highly  unlikely  that  the  decision  for  the  better  tracker  will  switch  as  the  threshold  is 
changed.  An  accurate  assessment  of  the  probability  of  a  wrong  decision  may  be  important  information  in 
the  selection  of  a  tracker  or  in  the  decision  to  evaluate  the  trackers  with  larger  data  sets. 
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TABLE  2 


Total  Conditional  Entropy  and  Probability  Estimates  of  Decision  Error  for  Different 

Association  Thresholds 


Association 

Threshold 

Tracker  1  Total 
Conditional  Entropy 

Tracker  2  Total 
Conditional  Entropy 

Probability  of 
Decision  Error 

0.850 

0.0421  ±0.0029 

0.0505  ±0.0032 

0.026016 

0.900 

0.0348  ±0.0027 

0.0408  ±  0.0029 

0.064873 

0.950 

0.0251  ±0.0023 

0.0293  ±0.0024 

0.102866 

0.990 

0.0142  ±0.0015 

0.021 7  ±0.001 9 

0.001059 

0.999 

0.0121  ±0.0013 

0.021 7  ±0.001 9 

0.000020 

8.2  SENSITIVITY  TO  THE  TOTAL  NUMBER  OF  COUNTS 

A  similar  sensitivity  analysis  can  be  performed  for  the  total  number  of  counts  in  the  accumulation 
matrix.  For  trackers,  this  is  usually  estimated  from  information  about  the  state  space  of  the  data  sets  being 
used  in  the  evaluation.  Each  dimension  in  the  state  space  of  the  test  data  set  contributes  a  multiplicative 
factor  to  the  overall  count.  The  contribution  of  quantized  dimensions  in  the  state  space  is  the  product  of 
the  number  of  elements  for  those  dimensions.  For  real- valued  dimensions,  the  contribution  is  determined 
from  the  ratio  of  the  ranges  of  the  dimensions  versus  the  desired  resolutions  in  those  dimensions.  The 
product  of  the  quantized  numbers  for  all  the  dimensions  estimates  the  total  number  of  counts  that  the 
accumulation  matrix  should  contain.  In  practice,  the  accumulation  matrices  are  filled  by  comparing  truth 
and  sensor  measurements  to  populate  the  bins.  The  (1,  1)  bin  is  filled  with  the  difference  between  the  total 
number  of  counts  estimated  for  the  state  space  and  the  total  number  of  counts  entered  in  all  the  other  bins 
in  the  array.  This  causes  the  total  number  of  counts  in  the  entire  matrix  to  equal  the  total  number  of 
possible  distinguishable  states  in  the  state  space  of  the  test  data  set.  For  classifiers,  the  total  number  of 
counts  is  entirely  set  by  the  size  of  the  data  sample.  Larger  data  samples  may  be  required  to  obtain  enough 
statistical  significance  for  a  test. 

The  sensitivity  analysis  presented  here  varies  the  total  number  of  counts  for  the  simulated  data  set 
which  hypothetically  generated  the  two  track  data  sets  used  as  examples  in  this  report.  The  association 
threshold  for  this  sensitivity  analysis  was  selected  to  be  a  constant  value  of  0.95  because,  in  the  previous 
sensitivity  analysis,  this  produced  the  highest  probability  that  Tracker  2  could  be  better  than  Tracker  1 
(for  the  values  evaluated).  The  information  theoretic  measurements  and  errors  were  estimated  for 
different  values  for  the  total  number  of  counts.  The  total  number  of  counts  were  scaled  by  factors  of  1/10, 
1,  and  10,  and  the  accumulation  matrices  for  Trackers  1  and  2  adjusted  to  produce  the  desired  counts.  In 
all  cases,  the  (1,1)  bin  in  the  accumulation  matrix  was  modified  to  get  the  desired  total  number  of  counts. 
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Given  an  original  total  number  of  counts  of  40,000,  the  three  analyses  were  conducted  for  accumulations 
matrices  with  4,000,  40,000,  and  400,000  counts.  Figure  18  shows  the  information  coverage  plot  for  the 
three  different  scales  of  the  total  number  of  counts.  As  the  total  number  of  counts  increases,  the 
information  completeness  increases  and  the  false  information  ratio  decreases.  The  error  covariances 
appear  to  increase  slightly  as  the  total  number  of  counts  increases. 


Figure  18.  The  information  coverage  plot  for  two  simulated  trackers  with  different  multipliers  on  the  total  number  of 
counts  in  the  accumulation  matrix. 


Figure  19  shows  the  total  conditional  entropy  for  the  three  different  scale  factors.  A  log-log  plot  is 
used  because  of  the  range  of  scale  factors  and  because  the  total  conditional  entropy  approximately  scales 
as  the  inverse  of  the  total  number  of  counts.  The  difference  in  performance  between  the  two  trackers 
widens  as  the  scale  factor  increases. 
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Figure  19.  The  total  conditional  entropy  for  the  two  simulated  trackers  for  different  scales  on  the  total  counts  in  the 
accumulation  matrix.  The  original  total  number  of  counts  was  40,000. 


Table  3  shows  the  total  conditional  entropy  for  the  two  trackers,  with  one-standard  deviation  errors 
for  three  different  scale  factors  on  the  total  number  of  counts.  The  decision  error  probability  for  selecting 
the  poorer  performing  tracker  as  the  better  tracker  is  also  shown.  The  probability  decreases  as  the  total 
number  of  counts  increases  by  approximately  the  same  order  of  magnitude.  The  results  imply  that 
accurate  estimates  of  the  total  number  of  counts  for  the  data  sets  are  required  to  generate  relatively 
accurate  estimates  of  the  decision  error  probabilities.  A  sensitivity  analysis  can  be  performed  if  there  are 
doubts  on  the  appropriate  choice  for  total  number  of  counts.  The  data  set  can  also  be  expanded  to  increase 
the  statistical  weight  of  the  evaluation.  For  this  example,  if  a  decrease  in  the  association  threshold  to  0.95 
produces  a  decision  error  probability  of  about  10%  and  with  a  further  decrease  in  the  expected  total 
number  of  counts  by  a  factor  of  1/10  produces  a  decision  error  probability  of  30%,  a  cautious  researcher 
who  cannot  defend  more  stringent  limits  for  the  association  threshold  and  total  number  of  counts  might 
elect  to  process  additional  data  sets  to  drive  down  the  errors  so  that  the  better  tracker  can  be  confidently 
identified. 
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TABLE  3 


Total  Conditional  Entropy  and  Probability  Estimates  of  Decision  Error  for  Different 

Scales  of  Total  Number  of  Counts 


Total  Count 
Scale  Factor 

Tracker  1  Total 
Conditional  Entropy 

Tracker  2  Total 
Conditional  Entropy 

Probability  of 
Decision  Error 

0.1 

0.2092610.01756 

0.22247  +  0.01715 

0.295211 

1.0 

0.0251410.00226 

0.02928  ±0.00237 

0.102866 

10.0 

0.00292  1  0.00027 

0.00361  ±0.00030 

0.044895 

With  the  availability  of  errors  on  the  information  theoretic  measures,  it  is  now  possible  to  determine 
the  significance  of  the  results  of  tracker  and  classifier  performance  analyses.  It  is  also  possible  to 
determine  the  sensitivity  of  the  analyses  to  parameters  and  algorithms  in  the  assessment  code  itself.  The 
provided  examples  of  sensitivity  analyses  show  that  these  influences  can  have  a  significant  influence  on 
the  significance  of  the  results.  The  selection  of  the  association  threshold  and  total  number  of  counts 
parameters  should  be  motivated  by  sound  reasoning  so  that  analysis  results  are  reliable.  If  the  estimated 
accuracies  of  these  parameters  are  poor,  a  sensitivity  analysis  should  be  performed  to  ensure  that  the 
better  tracker  is  identified  with  an  acceptable  level  of  confidence.  If  confidence  is  low,  additional  data  sets 
can  be  analyzed  to  drive  down  the  statistical  errors. 
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9.  SOFTWARE  FOR  THE  EVALUATION  OF  TRACKING  AND 
CLASSIFICATION  ALGORITHMS 


A  software  package  has  been  developed  in  MATLAB  for  other  researchers  to  use  in  their 
evaluation  of  tracking  and  classification  algorithms.  The  package  contains  a  directory  tree:  the  top  folder, 
which  contains  a  pdf  file  of  a  paper  from  the  International  Conference  on  Computer  Vision  2009  [1],  a 
pdf  file  of  this  technical  report,  and  two  folders:  ‘code’  and  ‘truthData.’ 

The  folder  ‘truthData’  contains  a  MATLAB  file,  testdata.mat,  which  contains  test  data  that  are  used 
by  test  functions  in  the  ‘code’  folder  to  check  that  the  algorithms  are  running  correctly.  It  also  contains 
the  file  dataGenerator.m,  which  was  used  to  generate  testdata.mat.  This  file  generates  a  simple,  contrived 
set  of  simulated  truth  and  tracker  generated  tracks  similar  to  what  might  be  generated  by  a  real  tracking 
algorithm.  This  data  set  was  used  to  produce  the  information  coverage  plots,  significance  tests,  and 
sensitivity  analyses  in  this  document.  Various  deficiencies  are  intentionally  placed  into  the  simulated 
tracker  tracks.  The  file  can  also  be  used  as  a  template  to  construct  data  structures  from  other  truth  data 
sets  and  trackers  so  that  the  function  calls  that  associate  the  sensor  and  truth  tracks  and  generate 
information  theoretic  algorithms  can  be  more  easily  called. 

The  folder  ‘code’  contains  the  file  readme.txt  and  eight  MATLAB  .m  files: 

associateSystemAndTruthTracks.nl 

assocLapmod.m 

computeMetricsFromAccumMat.m 

Ellipse_plot.m 

lapmodAssocSparse.m 

plotTrackerEvalResults.nl 

slOWolpert_Wolf.ni 

trackerEvalTestScript.nl 

The  file  readme.txt  contains  information  to  help  users  to  begin  using  the  software  package. 

The  key  MATLAB  function  for  the  user  is  computeMetricsFroniAccumMat.nl.  This  function 
estimates  the  information  theoretic  measures  from  accumulation  matrices.  The  function 
associateSystemAndTruthTracks.nl  may  be  useful  for  those  users  intending  to  evaluate  trackers.  The 
function  plotTrackerEvalResults.nl  may  be  useful  for  plotting  tracker  metrics.  The  function 
trackerEvalTestScript.m  is  provided  to  test  the  software.  This  function  should  be  run  from  the  ‘code’ 
directory. 

The  other  MATLAB  files  are  support  functions,  and  most  users  will  not  need  to  concern  themselves 
with  the  details  in  the  function.  The  function  slOWolpert_Wolf.ni  contains  the  code  that  generates 
information  theoretic  parameters  from  accumulation  matrices.  The  code  in  this  function  is  derived  from 
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the  equations  contained  in  this  document.  The  MATLAB  functions  assocLapmod.m  and 
lapmodAssocSparse.m  contain  code  to  execute  a  linear  assignment  algorithm  based  upon  an  algorithm 
developed  by  Volgenant  [14], 

9.1  USE  OF  C OMPUTEME TRIC SFROMACCUMMAT.M 

The  function,  computeMetricsFromAccumMat.nl,  accepts  up  to  three  input  variables: 

accumMat  -  A  matrix  containing  the  counts  of  the  number  of  times  that  the  classes  of  two  sets  of  objects 
were  associated.  The  matrix  must  contain  non-negative  real  values. 

prior  -  [Optional]  The  value  of  the  prior  distribution  factor  to  apply  to  accumMat.  This  may  be  entered 
as  a  string:  ‘Haldane,’  ‘Perks,’  ‘Jeffrey,’  ‘Bayes,’  ‘uniform’  (default),  a  non-negative  real  value, 
or  a  non-negative  real  matrix  that  is  the  same  size  as  accumMat. 

doCovs  -  [Optional]  A  flag  to  cause  the  function  to  calculate  and  return  a  covariance  matrix,  if  true.  The 
default  is  false. 

For  the  evaluation  of  multi-target  trackers,  the  accumulation  matrix,  accumMat,  is  constructed  so 
that  the  first  row  and  column  store  the  counts  of  the  number  of  times  that  truth  tracks  or  system  tracks  did 
not  associate  with  a  corresponding  track  in  the  other  set.  The  (1,  1)  element  is  filled  with  a  count  of  the 
number  of  true  negatives.  This  value  is  estimated  by  quantizing  the  state  space  used  to  associate  the  two 
sets  of  tracks.  The  total  number  of  counts  in  the  accumulation  matrix  is  estimated  from  the  total  practical 
volume  of  the  state  space  divided  by  the  typical  resolution  of  the  states  for  the  two  sets  of  tracks.  The  sum 
of  all  cells  in  accumMat  is  expected  to  equal  this  number,  and  cell  (1,  1)  is  set  to  satisfy  this  condition. 

For  the  evaluation  of  classifiers,  the  accumulation  matrix,  accumMat,  is  constructed  so  that  each 
row  and  column  corresponds  to  the  number  of  counts  where  the  data  for  truth  class  i  was  associated  with 
decision  class  j . 

The  input  variable,  prior,  defaults  to  uniform  or  Bayes’  prior.  Strings  can  be  entered  to  select  the 
kind  of  prior  that  a  user  wishes  to  adopt  for  their  calculations.  The  string  ‘Haldane’  will  add  zero  (0)  to  all 
the  entries  in  the  accumulation  matrix.  The  string  ‘Bayes’  or  ‘uniform’  will  add  one  (1)  to  all  entries  in 
the  matrix.  The  string  ‘Jeffrey’  will  add  1/2  to  all  the  entries  in  the  matrix.  The  string  ‘Perks’  will  add 
l/(NxM)  to  all  the  entries  in  the  matrix,  where  N  and  M  are  the  sizes  of  the  matrix  dimensions.  Users  can 
provide  a  non-negative  real  value  that  will  be  added  to  all  entries  in  the  matrix  accumMat  or  a  non¬ 
negative  real  matrix  of  priors  that  will  be  added  to  the  matrix  accumMat.  The  matrix  of  priors  must  be  the 
same  size  as  the  accumulation  matrix. 

The  Perks’  and  Haldane’s  priors  have  special  importance  in  information  theory  because  only  the 
Perks’  prior  and  the  Haldane’s  prior  will  result  in  the  same  calculations  of  the  entropies,  H(x)  and  H(y), 
for  different  numbers  of  dimensions.  For  example,  if  one  experimenter  has  access  to  the  two-dimensional 
accumulation  matrix  and  another  experimenter  only  has  access  to  the  counts  associated  with  one  of  the 
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dimensions,  they  can  only  be  guaranteed  to  calculate  the  same  entropy  values  for  the  common  dimension 
if  they  both  use  either  Haldane’s  prior  or  Perk’s  prior.  All  other  priors  will  in  general  cause  the  two 
experimenters  to  calculate  different  values.  Haldane’s  prior  leaves  the  prior  adjusted  counts  unchanged, 
so  it  is  obvious  why  the  entropy  calculations  are  consistent.  Perk’s  prior  is  different  and  a  short 
explanation  can  clarify  why  this  is  the  case.  For  the  Perks’  prior,  the  first  experimenter  will  add  1/(N  x  M) 
to  each  two-dimensional  cell,  while  the  second  experimenter  will  add  1/N  to  each  one-dimensional  cell. 
When  the  first  experimenter  calculates  vit  for  use  in  Equation  (16),  his  summation  over  M  will  produce 
values  equal  to  the  second  experimenter’s  values  for  vt ,  thus  producing  the  same  estimates  for  the 
entropy.  This  is  especially  useful  for  the  evaluation  of  trackers,  because  different  trackers  will  usually 
generate  different  numbers  of  tracks  and  accumulation  matrices  of  different  sizes.  In  general,  the  selection 
of  other  priors  will  cause  the  estimates  of  the  truth  entropy  to  differ  between  the  two  experimenters  [15]. 

It  is  recommended  that  the  input  variable  dCovs  be  allowed  to  default  to  ‘false’  when  preliminary 
calculations  are  being  generated.  Set  the  flag  to  ‘true’  when  covariance  matrices  are  actually  needed.  The 
calculation  of  the  covariance  matrices  can  take  a  significant  amount  of  time  to  complete,  depending  upon 
the  number  of  counts  and  the  size  of  the  matrix,  accumMat. 


The  output  variable  is  a  structure: 


result  -  result  structure  with  the  following  fields: 


.infoCompleteness 

falselnfoRatio 

.truthPurity 

.trackPurity 

.truthCompleteness 

.  trackCompleteness 

.trackLenHist 

.means 

.sList 

.covs 


-  Information  completeness 

-  False  information  ratio 

-  Truth  purity  (also  known  as  one-to-one  completeness) 

-  Trackpurity 

-  Truth  completeness  (also  known  as  many-to-many  completeness) 

-  Track  completeness 

-  The  associated  track  length  histogram  (in  frames) 

-  The  means  of  the  information  theoretic  measures 

-  Text  describing  the  contents  of  the  variable  ‘means’ 

-  The  covariance  matrix  for  the  means,  it  is  returned  if  doCovs  flag  is 
‘true’ 


The  field  infoCompleteness  is  the  ratio  of  the  expectation  of  mutual 
expectation  for  the  truth  entropy: 


information  versus  the 


Ic=E{l{x;y))/E{H{x)),  (144) 

where 

I(x;  y)  =  H(x)  +  H(y)  -  H(x,  y) 

so  (145) 

E(l(x-,y))  =  E(h(x )  +  //(>')  -  H(x,y )) . 
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The  field,  falselnfoRatio,  is  the  ratio  of  the  expectation  of  the  conditional  entropy  of  the  system  data 
given  the  truth  data  versus  the  expectation  of  the  truth  entropy: 

RFI=E{H{y\x))lE{H{x)).  (146) 


The  form  of  the  equation  for  the  purity  measure  for  individual  system  track  associated  with  column 

j  is 


PJ  = 


max{v2j,v3J,...,vMj) 


M 

X 


vij 


(147) 


where  V(/  is  the  accumulation  matrix.  The  maximization  over  the  indices  can  be  swapped  between  i  and 
j  to  obtain  individual  truth  track  purity  measures  for  truth  tracks  associated  with  the  corresponding  rows 
in  the  accumulation  matrix. 


The  overall  purity  for  the  field  trackPurity  is  calculated  with 


P 


1  N  ■ 

-Y- 


max1 


M 

I' 

i=l 


(148) 


where  the  indices  for  the  sum  and  maximization  function  are  swapped  to  obtain  the  field  truthPurity. 

The  form  of  the  equation  for  the  completeness  measure  for  the  individual  system  track  associated 
with  column  j  is 


C,  - 


M 

S' 

1=2 


J  M 


S' 

i=l 


(149) 


The  indices  for  the  sums  are  swapped  to  produce  the  corresponding  measure  for  individual  truth  tracks. 
The  equation  for  the  overall  completeness  field,  trackCompleteness,  is 


(150) 
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The  indices  over  the  sums  are  again  swapped  to  produce  the  statistics  for  the  field  truthCompleteness . 

The  field  trackLenHist  is  a  histogram  of  the  lengths  of  the  tracks  provided  by  the  tracker.  This  is 
effectively  the  number  of  counts  in  each  column  of  the  accumulation  matrix.  Some  researchers  consider 
track  length  to  be  a  useful  indicator  of  tracker  performance. 

The  field  means  is  a  1  x  7  vector  of  all  the  information  theoretic  quantities  that  are  calculated  by  the 
function,  computeMetricsFromAccumMat.nl.  The  values  returned  in  means  are  the  total  entropy,  H[x,y), 
the  truth  entropy,  H(x),  the  tracker  entropy,  H  (v),  the  mutual  information,  l(x;y),  the  truth  conditional 
entropy,  H{x  \  y) ,  the  sensor  conditional  entropy,  !l(y  |  x) ,  and  the  total  conditional  entropy, 
H{x\  y)+H(y\x). 

The  field  sList  is  a  cell  array  of  strings  defining  the  variables  in  means.  It  is  intended  to  be  used  to 
define  axis  labels  when  plotting  data  in  MATLAB. 

The  field  covs  is  a  7  x  7  covariance  matrix  for  the  vector  means.  It  should  be  remembered  that  there 
are  only  three  independent  variables  in  the  7x7  matrix.  The  estimation  of  eigenvalues  and  eigenvectors 
for  covariance  matrices  will  only  return  meaningful  results  if  three  or  fewer  rows  and  columns  are 
selected  for  the  operation. 

9.2  USE  OF  ASSOCIATESYSTEMANDTRUTHTRACKS.M 

The  function  associateSystemAndTruthTracks.nl  is  used  to  construct  the  accumulation  matrix  that 
is  used  as  input  into  computeMetricsFromAccumMat.nl.  It  is  used  to  evaluate  the  performance  of  multi¬ 
target  trackers  and  is  not  used  with  classifiers.  It  establishes  a  frame-by-frame  association  between  system 
tracks  and  truth  tracks  and  populates  the  accumulation  matrix  by  counting  the  number  of  times  that  truth 
and  system  tracks  are,  or  are  not,  assigned  to  each  other. 

function  accuniMat  =  associateSystemAndTruthTracks(systemTracks,  truthTracks,  stateSpaceSize, 
removeUnassociatedSystemTracks,  confidence) 


Input: 

systemTracks  -  An  array  of  tracks  from  the  tracker  indexed  by  sample  frames. 

Required  fields:  (m  tracks  in  this  frame) 

.id  (m)  -  track  ids 

.mean  (m  x  d)  -  track  states  (d  states) 

.cov  (m  x  d  x  d)  -  track  covariance 
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truthTracks  -  An  array  of  truth  tracks  indexed  by  sample  frames. 


Required  fields:  (n  tracks  in  this  frame) 

.id  (n)  -  track  ids 

.mean  (nxd)  -  track  states  (d  states) 

.cov  (n  x  d  x  d)  -  track  covariance 

stateSpaceSize  -  The  size  of  the  state  space  (the  number  of  distinguishable  objects  that  fit  in  the  state 
space).  This  is  typically  a  very  large  number. 

removeUnassociatedSystemTracks  -  A  flag  to  indicate  whether  to  remove  unassociated  system  tracks. 
This  is  useful  when  the  evaluator  has  incomplete  truth  and  wants  to  avoid  penalizing  the 
observation  of  untruthed  tracks  (optional:  default  =  false). 

confidence  -  The  association  gating  parameter  (optional:  default  =  0.99).  Valid  values  range  from  0  to 
1 .  The  inverse  chi-squared  function  is  used  to  set  the  internal  threshold  and  accounts  for  the 
dimensionality  of  the  state  space  (increasing  confidence  leads  to  a  larger  gate  and  looser 
associations). 


Output: 

accumMat  -  A  (w+1  x  z+1)  dimensional  matrix,  where  w  is  the  total  number  of  truth  tracks  and  z  is 
the  total  number  of  system  tracks.  The  first  row  contains  the  counts  of  the  number  of  times 
each  sensor  track  failed  to  associate  with  any  truth  tracks  (false  system  tracks).  The  first 
column  contains  the  counts  of  the  number  of  times  each  truth  track  failed  to  associate  with  any 
system  tracks  (missed  truth  tracks).  Cell  (1,  1)  contains  the  total  number  of  true  negative 
counts  and  the  function  uses  the  input  variable,  stateSpaceSize,  to  estimate  this  number. 

The  function  assumes  that  the  systemTracks  and  truthTracks  are  sampled  at  the  same  times  and 
have  the  same  array  size  (i.e.,  number  of  sample  frames).  It  is  assumed  that  the  state  spaces  of  the  system 
and  truth  tracks  are  consistent. 

This  function  calls  assocLapmod.m,  which  in  turn  calls  lapmodAssocSparse.m  to  perform  the  truth- 
to-tracker  association  with  a  linear  assignment  algorithm  that  associates  the  tracks  based  upon  their 
covariance  weighted  Mahalanobis  distances  and  the  association  confidence  value. 

9.3  CODING  TECHNIQUES  IN  SIOWOLPERT  WOLF.M 

Although  the  casual  user  may  not  be  interested  in  the  details  of  slOWolpert_Wolf.ni,  some  details 
are  provided  for  more  serious  users  who  may  wish  to  validate  or  improve  on  the  accuracy  of  the 
covariance  estimates  that  are  returned  by  computeMetricsFromAccumMat.nl. 
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The  functions  for  AO ^\x,z)  and  A®!2l(v,  z)  have  been  coded  as  special  algorithms  to  reduce 
numerical  precision  errors.  The  MATLAB  psi  function  is  somewhat  inaccurate  for  estimating  differences 
when  both  x  and  z  are  large  and  close  to  each  other.  When  x  and  z  are  within  90%  of  each  other  and  less 
than  1 08,  the  iteration  rule  of  Equation  (61)  is  used  to  sum  terms  until  the  difference  between  x  and  the 
modified  z  are  less  than  one  (1).  The  MATLAB  psi  function  is  then  used  to  account  for  the  fractional 
differences  between  the  two  modified  values.  Otherwise,  the  MATLAB  psi  function  is  used  to  directly 
estimate  AO ^(x,z).  Equivalent  summations  are  performed  for  A®^(x,z)  under  the  same  conditions. 
The  code  can  be  found  in  slOWolpert_Wolf.ni  and  the  recursion  relationships  in  Abramovitz  and  Stegun 
[12]- 


The  terms  that  involve  summations  that  contribute  to  the  covariance  term,  cr^  ,  are  implemented  as 
recursive  functions.  The  terms  are  grouped  into  pairs  of  sub-terms  with  a  common  factor  of  l/(k*(k  +  1)) 
to  maximize  the  cancellation  between  sub-terms.  Common  constant  terms  in  the  two  sub-terms  are  moved 
outside  of  the  summation,  causing  the  first  sub-term  within  the  summation  to  have  a  value  of  one  (1)  and 
the  second  sub-term  to  be  a  function  of  gamma  function  ratios  and  other  multiplicative  terms.  The 
functions  are  written  to  recursively  call  themselves  when  the  value  for  the  next  term  in  the  summation 
drops  below  a  defined  percentage  of  the  accumulated  sum  of  terms  at  the  current  level  in  the  recursion. 
The  called  function  begins  to  sum  another  block  of  terms.  When  the  second  sub-term  drops  below  an 
internally  defined  value  (currently  the  double  precision  epsilon  value  in  MATLAB,  ~2.2e-16),  the 
recursive  call  terminates,  and  the  collected  partial  sum  at  each  level  in  the  recursion  is  combined  with  the 
returned  sum  from  the  lower  level  to  compute  the  overall  sum.  The  recursive  summation  significantly 
improves  the  accuracy  of  the  overall  result.  The  recursive  calls  also  return  the  number  of  summations  that 
were  performed.  When  the  summation  terminates,  the  first  term  (with  a  value  of  one)  totally  dominates 
the  uncompleted  summation  to  infinity.  Its  contribution  can  be  accounted  for  by  an  analytical  solution  to 
the  sum  of  l/(k*(k  +  1))  from  q  to  infinity  using 

&0TTT  <151) 

The  recursive  summation  reduces  the  accumulated  round-off  to  some  degree  while  still  allowing  the 
gamma  ratio  terms  to  be  calculated  as  simple  multiplications  with  previously  accumulated  gamma  ratio 
terms. 


If  the  recursion  level  reaches  an  internally  defined  depth  without  the  second  term  dropping  below 
the  termination  value,  the  recursive  function  calls  an  approximation  function  that  interpolates  terms  across 
multiple  step  sizes.  The  step  size  is  continually  increased  such  that  the  function  call  will  terminate  after  a 
reasonable  number  of  steps.  The  function  also  terminates  if  the  second  term  drops  below  the  internally 
defined  value  (currently  the  double  precision  epsilon  value  in  MATLAB,  ~2.2e-16). 

To  interpolate  the  second  term  in  the  approximation  function  if  the  recursive  call  goes  too  deep, 
internal  functions  are  used  to  calculate  the  double  gamma  ratio  terms  instead  of  using  recursive 
multiplication.  The  approximation  for  gamma  ratio  terms  is  taken  from  Abramovitz  and  Stegun  [12], 
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although  other  approximations  are  now  available  that  could  be  more  advantageous  if  more  speed  or 
accuracy  is  required  [16-18]. 

The  portion  of  the  code  in  slOWolpert_Wolf.ni  that  calculates  <r  constructs  an  array  of  values  for 
the  unique  sets  of  ,  v,n ,  and  vin  for  all  the  terms  in  the  accumulation  matrix.  The  function  swaps  vim 
and  v,n  to  have  the  largest  term  first  because  the  equation  is  symmetric  in  the  exchange  of  these  terms. 
Numerical  computation  is  more  accurate  with  the  largest  term  first  in  the  list.  The  unique  list,  with 
potentially  swapped  ia.  and  v,n  terms,  is  then  used  to  estimate  the  contribution  of  each  set  of  elements  to 
avoid  identical  computations.  This  dramatically  speeds  up  execution  of  the  code  for  highly  symmetric 
accumulation  matrices. 
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10.  CONCLUSION 


Error  estimates  on  information  theoretic  measures  are  required  to  assess  the  statistical  significance 
of  tests  on  the  relative  performance  of  multi-target  trackers  and  classifiers.  The  papers  of  Wolpert  and 
Wolf  provide  exact  formulae  to  estimate  information  theoretic  measures  and  errors  from  data  samples. 
Direct  conversion  of  the  equations  into  computer  code  provided  good  estimates  of  the  information 
theoretic  means  and  errors  for  sample  sizes  on  the  order  of  thousands,  but  became  inaccurate  for  larger 
sample  sizes.  Reformulation  of  the  equations  to  take  advantage  of  cancellation  between  sub-terms  in  the 
equations  has  significantly  improved  the  numerical  precision  of  the  computer  code.  The  revised  code 
provides  a  reasonable  level  of  accuracy  for  all  data  sample  sizes  and  should  meet  the  accuracy  needs  of 
most  researchers  who  evaluate  multi-target  trackers  and  classifiers.  The  computer  code  makes  it  possible 
to  obtain  a  first-order  estimate  of  the  significance  of  analysis  results  and  to  determine  if  more  data  may  be 
needed  to  support  the  conclusions  of  an  analysis.  The  computer  code  may  be  useful  to  researchers  who 
calculate  information  theoretic  measures  from  data  samples  for  other  puiposes  besides  decision  system 
assessments. 

This  report  has  focused  on  methods  to  estimate  reasonably  accurate  numbers  for  the  information 
theoretic  mean  and  covariance  error  terms.  The  study  has  proceeded  with  the  assumption  that  the  samples 
used  to  construct  the  accumulation  matrices  are  statistically  independent.  Statistical  independence  of  data 
samples  for  the  analysis  of  classification  algorithms  can  be  achieved  by  taking  care  to  select  independent 
test  samples.  Statistical  correlation  in  the  analysis  of  tracking  algorithms  is  much  more  difficult  to  avoid. 
The  analysis  method  relies  on  the  fact  that  tracker  pathologies  like  track  swaps,  missed  tracks,  and  extra 
tracks  will  spread  counts  across  the  accumulation  matrix  and  increase  the  total  conditional  entropy.  When 
a  tracker  is  performing  well,  the  estimated  states  for  one  time  frame  will  be  highly  correlated  with  the 
next  frame,  and  lead  to  some  degree  of  coupling  in  the  counts  in  the  accumulation  matrix.  Additional 
research  is  required  to  identify  how  to  quantitatively  account  for  the  influence  of  this  coupling  so  that  the 
significance  of  the  test  results  can  be  accurately  determined.  While  appropriate  procedures  are  identified, 
one  technique  to  reduce  the  impact  of  this  systematic  coupling  is  to  use  multiple,  independent  regions  to 
assess  tracker  performance.  The  information  theoretic  measures  from  each  region  will  be  statistically 
independent  and  can  be  added  together  to  get  an  overall  performance  assessment  with  greater  accuracy 
and  significance. 

An  additional  area  where  effort  could  be  devoted  is  in  modifications  to  the  software  package  to 
improve  the  execution  speed.  Conversion  of  the  computer  code  from  MATLAB  to  C  or  Java  should 
significantly  improve  execution  time.  If  required  by  some  applications,  additional  speed  could  be 
achieved  by  conversion  of  the  algorithm  to  execute  on  parallel  processors.  This  might  be  useful  for 
estimating  information  theoretic  measures  for  very  large  dimension  accumulation  matrices.  Other  tweaks 
in  performance  can  be  made  by  modifying  the  constants  used  to  terminate  loops  and  initiate  recursive 
calls,  thereby  trading  numerical  precision  for  speed.  These  trades  are  likely  to  be  application  dependent. 
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As  a  final  note,  an  interesting  discovery  made  during  the  course  of  improving  the  numerical 
accuracy  of  Wolpert  and  Wolfs  equations  was  that  the  use  of  Perks’  prior  or  Haldane’s  prior  causes  the 
algorithm  to  produce  consistent  estimates  of  information  theoretic  measures  for  different  dimensions  [15]. 
The  entropy  values  calculated  with  Wolpert  and  Wolfs  formulae  for  two  dimensions  produce  the  same 
values  as  those  produced  if  only  one  dimension  was  used  for  the  calculation  of  the  entropies.  This  means 
that  consistent  information  theoretic  measures  will  be  calculated  with  Perks’  and  Haldane’s  priors 
regardless  of  the  number  of  nuisance  dimensions  that  might  be  included  in  the  calculations. 
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